Sage-N Research, Inc. (Sage-N) defines the next generation SEQUEST standard specifically for throughput and sensitivity required for translational proteomics research, especially involving phosphorylation and protein post-translational modifications (PTMs) important to cancer and stem cell research. The new SEQUEST 3G standard is defined in close collaboration with Dr. John R. Yates, III of the Scripps Research Institute, the primary co-inventor of the original SEQUEST search engine (SE). It defines a single common standard for similarity scores, search statistics, and file formats to provide a robust foundation that meets the needs of translational research, including support for high-accuracy mass spectrometers and dissociation technologies such as electron-transfer dissociation (ETD).
Translational proteomics technology has evolved specifically to accurately characterize low-abundance proteins and protein PTMs in complex cell lysates, in contrast to first generation proteomics which optimized for simpler protein mixtures.
Therefore, a translational proteomics search engine must:
(1) be optimized for high-throughput searching of large data sets with 100K+ spectra,
(2) include digital signal processing (DSP) to improve sensitivity for noisy spectra,
(3) provide robust PTM search, including support for ETD, and
(4) be extensible to support multiple similarity scores to improve specificity.
The SEQUEST 3G proteomics SE is the first commercially robust version to incorporate all these requirements and more. It is the latest generation of Sage-N Research’s re-implementation of the SEQUEST algorithm using a proprietary, patent-pending indexed search optimized for high-throughput and on-the-fly PTM searching, called SORCERER-SEQUEST. Comprehensive PTM searches against species-specific protein sequences can now exceed 100,000 spectra per hour even for a low-end SORCERER system, which is several orders of magnitude faster than common PC software for covering the same search space.
“The new SEQUEST 3G standard is an important step forward in the evolution of search engines,” notes Dr. Yates, “by incorporating standards to work with the latest technologies in mass spectrometry.”
SEQUEST 3G improves on previous SEQUEST versions by the use of the DSP-based cross-correlation score (XCorr) as the primary score, while preserving the “preliminary score” (Sp) value for the top 500 XCorr hits. This feature improves search sensitivity for phosphorylated peptides, while maintaining backward compatibility with existing SEQUEST-based workflows.
In addition, SEQUEST 3G includes the calculation of the “E-value” statistical parameter for each spectrum. This parameter, popularized by genomics’ BLAST algorithm, measures the likelihood of the high XCorr score being derived by chance alone.
More importantly, the E-value allows a more statistically rigorous estimation of the significance of the top XCorr hit, and can replace the simple but problematic “delta-Cn” (dCn) parameter for searching small protein databases using high mass accuracy data.
These important enhancements update the venerable SEQUEST standard for translational proteomics research using modern mass spectrometers, just as the proteomics practitioners are finally moving from simple protein identification to profiling molecular pathways.
Leading proteomics labs are routinely using different search engines in parallel to achieve higher sensitivity and specificity. While this is commonly performed with multiple software copies running in separate servers, this can be most efficiently implemented using a highly sensitive, first-pass filter that reports, for example, the top 50 candidates, which can then be rescored at the second pass using more sophisticated, computing-intensive scoring modules.
This multiple-pass search engine architecture, where the first pass is the SEQUEST 3G engine, is the basis of the SORCERER Search Engine Architecture available on the SORCERER platform starting with v4.0. The second-pass rescore modules can implement different similarity scores without the costly need to search the protein sequences, and allows more sophisticated fragmentation models for improved sensitivity and specificity normally prohibitive with one-pass search engines. Specialized rescore modules, including open source modules, will be available for different mass spectrometers and technologies including ETD and PTM site localization.
SEQUEST 3G is developed and maintained exclusively by Sage-N Research and is available for licensing within third party bioinformatics software suites for a variety of mass spectrometers and technologies.