Community benchmarking and evaluation of human unannotated microprotein detection by mass spectrometry based proteomics

Autor/innen

  • Aaron Wacholder
  • Eric W. Deutsch
  • Leron W. Kok
  • Jip T. van Dinter
  • Jiwon Lee
  • James C. Wright
  • Sebastien Leblanc
  • Ayodya H. Jayatissa
  • Kevin Jiang
  • Ihor Arefiev
  • Kevin Cao
  • Francis Bourassa
  • Felix-Antoine Trifiro
  • Michal Bassani-Sternberg
  • Pavel V. Baranov
  • Annelies Bogaert
  • Sonia Chothani
  • Ivo Fierro-Monti
  • Daria Fijalkowska
  • Kris Gevaert
  • Norbert Hubner
  • Jonathan M. Mudge
  • Jorge Ruiz-Orera
  • Jana Schulz
  • Juan Antonio Vizcaíno
  • John R. Prensner
  • Marie A. Brunet
  • Thomas F. Martinez
  • Sarah A. Slavoff
  • Xavier Roucou
  • Jyoti S. Choudhary
  • Sebastiaan van Heesch
  • Robert L. Moritz
  • Anne-Ruxandra Carvunis

Journal

  • Nature Communications

Quellenangabe

  • Nat Commun

Zusammenfassung

  • Thousands of short open reading frames (sORFs) are translated outside of annotated coding sequences. Recent studies have pioneered searching for sORF-encoded microproteins in mass spectrometry (MS)based proteomics and peptidomics datasets. Here, we assessed literature-reported MS-based identifications of unannotated human proteins. We find that studies vary by three orders of magnitude in the number of unannotated proteins they report. Of nearly 10,000 reported sORF-encoded peptides, 96% were unique to a single study, and 12% mapped to annotated proteins or proteoforms. Manual curation of a benchmark dataset of 406 manually evaluated spectra from 204 sORF-encoded proteins revealed large variation in peptide-spectrum match (PSM) quality between studies, with immunopeptidomics studies generally reporting higher quality PSMs than conventional enzymatic digests of whole cell lysates. We estimate that 65% of predicted sORF-encoded protein detections in immunopeptidomics studies were supported by high-quality PSMs versus 7.8% in nonimmunopeptidomics datasets. Our work stresses the need for standardized protocols and analysis workflows to guide future advancements in microprotein detection by MS towards uncovering how many human microproteins exist.


DOI

doi:10.1038/s41467-025-68002-x