Finding sparse features in strongly confounded
medical binary data


  • S. Mandt
  • F. Wenzel
  • S. Nakajima
  • J. Cunningham
  • C. Lippert
  • M. Kloft


  • NIPS workshop on Machine Learning For Healthcare


  • A typical task in statistical genetics is to find a sparse linear relation between genotypes with phenotypes, but often the data are confounded by age, ethnicity or population structure. We generalize the linear mixed model (LMM) Lasso approach for feature selection under confounding to the case of binary labels. This case is much more involved, as arginalization over the correlated noise leads to an intractable integral. We can overcome this problem with approximate inference techniques. We demonstrate on synthetic and real-world data that the sparse features that our method finds are less correlated with the top confounders.