folder

Finding sparse features in strongly confounded
medical binary data

Authors

  • S. Mandt
  • F. Wenzel
  • S. Nakajima
  • J. Cunningham
  • C. Lippert
  • M. Kloft

Journal

  • NIPS workshop on Machine Learning For Healthcare

Abstract

  • A typical task in statistical genetics is to find a sparse linear relation between genotypes with phenotypes, but often the data are confounded by age, ethnicity or population structure. We generalize the linear mixed model (LMM) Lasso approach for feature selection under confounding to the case of binary labels. This case is much more involved, as arginalization over the correlated noise leads to an intractable integral. We can overcome this problem with approximate inference techniques. We demonstrate on synthetic and real-world data that the sparse features that our method finds are less correlated with the top confounders.