Data Sciences and Artificial Intelligence

Crosscutting Focus Area at the MDC


Data Science, the extraction of relevant insights from data, has firmly positioned itself at the forefront of health and life science research. Machine Learning and Deep Learning are foreseen to become key disruptive technologies within Data Science in translational research and personalized medicine applications.

The crosscutting focus area currently comprises 15 research groups spanning MDC’s different research areas, with an emphasis on:

  1. omics and precision medicine, including single-cell technologies
  2. data integration and disease modelling
  3. biomedical image analysis and complex phenotyping
  4. epidemiology and health data integration




Omics and precision medicine


Omics and precision medicine addresses models for gene regulatory networks to understand how they are encoded in the epigenome and non-coding regulatory sequences. This is complemented by machine learning to model cancer evolution, as well as to use gene regulation and expression data to characterize cancer subtypes at the molecular level.

Aided by high-resolution single-cell molecular data, we anticipate that our AI efforts will come to fruition to interpret patient genomes in different contexts, such as cancer cohorts or patient-derived organoids, as well as in vitro differentiation systems that recapitulate the molecular disease phenotypes of specific patient variants.

Data integration and disease modelling


Data integration and disease modelling uses genomic, proteomic and phenotyping data in mechanistic mathematical multiscale models and elucidates and describes disease mechanisms. The particular strength of this approach is the integration of heterogeneous data to establish the link to cell and organ function.​​

Biomedical image analysis and complex phenotyping


Biomedical image analysis and complex phenotyping platforms that provide phenotypic readouts across multiple scales, form a key component of Data Science at the MDC. Our groups develop new algorithms to process, visualize, and analyze large-volume datasets, to enable integrative analyses and data-driven classification using approaches for more reliable diagnosis and personalized treatment. Our platforms span from in vivo, live imaging up to whole model organisms, down to the resolution of single molecule readouts.

Epidemiology and health data integration


Epidemiology and health data integration studies connect the above approaches with large-scale data from clinical studies or insurance records that provide a wide array of phenotypic data for healthy and affected individuals. Harmonizing clinical data across study centers and patients will allow the bidirectional projection of findings between healthy cohorts and patient groups with severe diseases, as well as the study of treatment effects.


Data Science at the MDC coordinates efforts to provide reproducible, scalable pipelines for

  1. standardized data analysis and deep phenotyping,
  2. biosample data collection and
  3. microbiome analysis

with the overall aim to reveal new insights on disease progression and treatment.

To this end, Data Science at the MDC develops data science applications across its research areas to:

  • model biological processes from the cellular to the organismal level,
  • detect patterns of health/disease trajectories and
  • identify early warning biomarkers for drug development or repurposing.




Jointly with Charité researchers within the Berlin Institute of Health (BIH), the Berlin Long term Observation of Cardiovascular Events (BeLOVE) follows circa 10,000 subjects with primary cardiovascular disease or key precursor type 2 diabetes. BeLOVE allows for direct observation of disease comorbidities, study of mechanisms and differential risk factors and determinants of treatment efficacy.

BeLOVE project page


The MDC hosts a study center for the German National Cohort (NaKO), which tracks health trajectories on a population level over longer time scales.


LifeTime, a new pan-European consortium of more than 90 leading research institutions supported by over 70 companies, aims at revolutionising healthcare by mapping, understanding, and targeting human cells during disease. An entire work package is focused on “Data Science, Artificial Intelligence and Machine Learning”. The initiative is jointly coordinated by Nikolaus Rajewsky from the MDC and Geneviève Almouzni from the Institut Curie.

LifeTime website

Berlin Center for Machine Learning (BZML)

The Berlin Center for Machine Learning (BZML, Berliner Zentrum für Maschinelles Lernen) aims at the systematic and sustainable expansion of interdisciplinary machine learning research, both in proven research constellations as well as in new, highly topical scientific objectives that have not yet been jointly researched.

BZML website


The 'German Network for Bioinformatics Infrastructure – de.NBI' is a national, academic and non-profit infrastructure supported by the Federal Ministry of Education and Research providing bioinformatics services to users in life sciences research and biomedicine in Germany and Europe. The partners organize training events, courses and summer schools on tools, standards and compute services provided by de.NBI to assist researchers to more effectively exploit their data.

de.NBI website


The Pan-Cancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium. The Schwarz Lab is part of PCAWG Working Group 3 (Interaction of Genome and Transcriptome) and is responsible for conducting allele-specific expression analyses to understand the impact of somatic genetic variation on gene expression in these 2800 tumours.

PCAWG website


Doctoral Education

MDC faculty contribute to the following Data Science doctoral education programs, either as coordinators (HEIBRiDS, Regulatory Genome) or as partners (CompCancer):



The MDC is one of the six Helmholtz Centers that have joined forces with the Einstein Center Digital Future to create a new PhD program in data science. Established in 2018, the Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS) is an interdisciplinary school that trains young scientists in Data Science applications within a broad range of natural science domains, spanning from Earth & Environment, Astronomy, Space & Planetary Research to Geosciences, Materials & Energy and Molecular Medicine.

HEIBRiDS website



CompCancer is a PhD programme (DFG funded research training group) that focusses on computational aspects of cancer research. The goal of CompCancer is to develop and apply computational methods on relevant questions of current cancer research and thereby train the next generation of computational oncologists.

CompCancer website


Regulatory Genome

In an alliance between Berlin institutions (led by Humboldt University) and Duke University, the DFG-funded international research training group Dissecting and Reengineering the Regulatory Genome aims to teach the next generation of researchers a quantitative understanding of genome function and gene regulation within the context of biological systems.

Regulatory Genome website


MDC-funded PhD positions on Data Science

In addition to the above PhD Programs, Data Science group leaders participate in the MDC Graduate Program, which runs PhD Recruitment rounds twice a year.

Call Spring 2020


MSc Education

MDC faculty contribute to the following MSc Programs of partner Universities:


Master Program Data Science

The Master Program Data Science is a new program offered by the Department of Mathematics and Computer Science of the Free University of Berlin. It is aimed at students who wish to specialize in the processing and analysis of large amounts of data.

MSc Data Science website


Master Program in Bioinformatics

Employing adequate training in the various sub-disciplines, this program provides the required knowledge for students to be able to judge mathematical methods and models, to recognize relevant biological questions, and to correctly interpret the results of the models in a biological context.

MSc Bioinformatics website

Master Program in Biophysics

The Master Program in Biophysics of the Humboldt University in Berlin offers research-based teaching in the interdisciplinary field of experimental and theoretical biophysics.

MSc Biophysics website (in German)

News & Events

Regular lectures & seminars


Press releases


Further focus areas:

•    Biomedical Imaging
•    Single-Cell Technologies
•    Humanized Disease Models
•    Translational Research