proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes


  • A. Fullam
  • I. Letunic
  • T.S.B. Schmidt
  • Q.R. Ducarmon
  • N. Karcher
  • S. Khedkar
  • M. Kuhn
  • M. Larralde
  • O.M. Maistrenko
  • L. Malfertheiner
  • A. Milanese
  • J.F.M. Rodrigues
  • C. Sanchis-López
  • C. Schudoma
  • D. Szklarczyk
  • S. Sunagawa
  • G. Zeller
  • J. Huerta-Cepas
  • C. von Mering
  • P. Bork
  • D.R. Mende


  • Nucleic Acids Research


  • Nucleic Acids Res 51 (D1): D760-D766


  • The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at