folder

Expanding the human proteome with microproteins and peptideins

Authors

  • Eric W. Deutsch
  • Leron W. Kok
  • Jonathan M. Mudge
  • Cristian F. Valls
  • Irwin Jungreis
  • Jorge Ruiz-Orera
  • Zhi Sun
  • Ulrike Kusebauch
  • Ivo Fierro-Monti
  • Jennifer G. Abelin
  • M. Mar Alba
  • Julie L. Aspden
  • Sreejan Bandyopadhyay
  • Kaushik Banerjee
  • Pavel V. Baranov
  • Ariel A. Bazzini
  • Francis Bourassa
  • Elspeth A. Bruford
  • Lorenzo Calviello
  • Steven A. Carr
  • Anne-Ruxandra Carvunis
  • Sonia Chothani
  • Jim Clauwaert
  • Kellie Dean
  • Pouya Faridi
  • Adam Frankish
  • Amy Goodale
  • Thomas Green
  • Norbert Hubner
  • Nicholas T. Ingolia
  • Manolis Kellis
  • Michele Magrane
  • Maria Jesus Martin
  • Thomas F. Martinez
  • Gerben Menschaert
  • Uwe Ohler
  • Sandra Orchard
  • Alisa Potter
  • Owen J.L. Rackham
  • Matthew G. Rees
  • David E. Root
  • Jennifer A. Roth
  • Xavier Roucou
  • Fernando J. Sialana
  • Sarah A. Slavoff
  • Michał I. Świrski
  • Jack A.S. Tierney
  • Félix-Antoine Trifiro
  • Eivind Valen
  • Valeriia Vasylieva
  • Aaron Wacholder
  • Shengbo Wang
  • Li Wang
  • Jonathan S. Weissman
  • Wei Wu
  • Zhi Xie
  • Jyoti S. Choudhary
  • Michal Bassani-Sternberg
  • Juan Antonio Vizcaíno
  • Nicola Ternette
  • Marie A. Brunet
  • Robert L. Moritz
  • John R. Prensner
  • Sebastiaan van Heesch

Journal

  • Nature

Citation

  • Nature

Abstract

  • A major scientific drive is to characterize the protein-coding genome, which is a primary basis for studying human health. But the fundamental question remains of what has been missed in previous analyses. Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for biomedical science. However, a key gap in knowledge has been which ncORFs produce small microproteins or alternative protein molecules that contribute to the human proteome. Here we report the collaborative efforts of the TransCODE Consortium to produce a consensus landscape of protein-level evidence for ncORFs. We show that about 25% of a set of 7,264 ncORFs gives rise to detectable peptides in a large-scale analysis of 95,520 proteomics experiments. We develop an annotation framework for ncORF-encoded microproteins as human proteins and codify the new conceptual model of ‘peptideins’ as microproteins that have indeterminate potential as functional proteins. To probe the biological implications of peptideins, we create an evolutionary analysis approach, termed ORF relative branch length (ORBL), and determine that evolutionary constraint is common and associates with observation of ncORF-derived peptides. We then characterize a pan-essential cellular phenotype for one peptidein from the OLMALINC long non-coding RNA. Overall, we generate public research tools supported by GENCODE and PeptideAtlas and advance biomedical discovery for understudied components of the human proteome.


DOI

doi:10.1038/s41586-026-10459-x