Identification of individuals by trait prediction using
whole-genome sequencing data


  • C. Lippert
  • R. Sabatini
  • M.C. Maher
  • E.Y. Kang
  • S. Lee
  • O. Arikan
  • A. Harley
  • A. Bernal
  • P. Garst
  • V. Lavrenko
  • K. Yocum
  • T. Wong
  • M. Zhu
  • W.Y. Yang
  • C. Chang
  • T. Lu
  • C.W.H. Lee
  • B. Hicks
  • S. Ramakrishnan
  • H. Tang
  • C. Xie
  • J. Piper
  • S. Brewerton
  • Y. Turpaz
  • A. Telenti
  • R.K. Roby
  • F.J. Och
  • J.C. Venter


  • Proceedings of the National Academy of Sciences of the United States of America


  • Proc Natl Acad Sci U S A 114 (38): 10166-10171


  • Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.