Better together: data harmonization and cross-study analysis of abdominal MRI data from UK Biobank and the German National Cohort


  • S. Gatidis
  • T. Kart
  • M. Fischer
  • S. Winzeck
  • B. Glocker
  • W. Bai
  • R. Bülow
  • C. Emmel
  • L. Friedrich
  • H.U. Kauczor
  • T. Keil
  • T. Kröncke
  • P. Mayer
  • T. Niendorf
  • A. Peters
  • T. Pischon
  • B.M. Schaarschmidt
  • B. Schmidt
  • M.B. Schulze
  • L. Umutle
  • H. Völzke
  • T. Küstner
  • F. Bamberg
  • B. Schölkopf
  • D. Rueckert


  • Investigative Radiology


  • Invest Radiol 58 (5): 346-354


  • OBJECTIVES: The UK Biobank (UKBB) and German National Cohort (NAKO) are among the largest cohort studies, capturing a wide range of health-related data from the general population, including comprehensive magnetic resonance imaging (MRI) examinations. The purpose of this study was to demonstrate how MRI data from these large-scale studies can be jointly analyzed and to derive comprehensive quantitative image-based phenotypes across the general adult population. MATERIALS AND METHODS: Image-derived features of abdominal organs (volumes of liver, spleen, kidneys, and pancreas; volumes of kidney hilum adipose tissue; and fat fractions of liver and pancreas) were extracted from T1-weighted Dixon MRI data of 17,996 participants of UKBB and NAKO based on quality-controlled deep learning generated organ segmentations. To enable valid cross-study analysis, we first analyzed the data generating process using methods of causal discovery. We subsequently harmonized data from UKBB and NAKO using the ComBat approach for batch effect correction. We finally performed quantile regression on harmonized data across studies providing quantitative models for the variation of image-derived features stratified for sex and dependent on age, height, and weight. RESULTS: Data from 8791 UKBB participants (49.9% female; age, 63 ± 7.5 years) and 9205 NAKO participants (49.1% female, age: 51.8 ± 11.4 years) were analyzed. Analysis of the data generating process revealed direct effects of age, sex, height, weight, and the data source (UKBB vs NAKO) on image-derived features. Correction of data source-related effects resulted in markedly improved alignment of image-derived features between UKBB and NAKO. Cross-study analysis on harmonized data revealed comprehensive quantitative models for the phenotypic variation of abdominal organs across the general adult population. CONCLUSIONS: Cross-study analysis of MRI data from UKBB and NAKO as proposed in this work can be helpful for future joint data analyses across cohorts linking genetic, environmental, and behavioral risk factors to MRI-derived phenotypes and provide reference values for clinical diagnostics.