Detection of human unannotated microproteins by mass spectrometry-based proteomics: a community assessment
Autor/innen
- A. Wacholder
- E.W. Deutsch
- L.W. Kok
- J.T. van Dinter
- J. Lee
- J.C. Wright
- S. Leblanc
- A.H. Jayatissa
- K. Jiang
- I. Arefiev
- K. Cao
- F. Bourassa
- F.A. Trifiro
- M. Bassani-Sternberg
- P.V. Baranov
- A. Bogaert
- S. Chothani
- I. Fierro-Monti
- D. Fijalkowska
- K. Gevaert
- N. Hubner
- J.M. Mudge
- J. Ruiz-Orera
- J. Schulz
- J.A. Vizcaino
- J.R. Prensner
- M.A. Brunet
- T.F. Martinez
- S.A. Slavoff
- X. Roucou
- J.S. Chaudhary
- S. van Heesch
- R.L. Moritz
- A.R. Carvunis
Journal
- bioRxiv
Quellenangabe
- bioRxiv
Zusammenfassung
Thousands of short open reading frames (sORFs) are translated outside of annotated coding sequences. Recent studies have pioneered searching for sORF-encoded microproteins in mass spectrometry (MS)-based proteomics and peptidomics datasets. Here, we assessed literature-reported MS-based identifications of unannotated human proteins. We find that studies vary by three orders of magnitude in the number of unannotated proteins they report. Of nearly 10,000 reported sORF-encoded peptides, 96% were unique to a single study, and 12% mapped to annotated proteins or proteoforms. Manual curation of a benchmark dataset of 406 manually evaluated spectra from 204 sORF-encoded proteins revealed large variation in peptide-spectrum match (PSM) quality between studies, with immunopeptidomics studies generally reporting higher quality PSMs than conventional enzymatic digests of whole cell lysates. We estimate that 65% of predicted sORF-encoded protein detections in immunopeptidomics studies were supported by high-quality PSMs versus 7.8% in non-immunopeptidomics datasets. Our work stresses the need for standardized protocols and analysis workflows to guide future advancements in microprotein detection by MS towards uncovering how many human microproteins exist.