2D map of gene expression in the metastatic lymph node

Beyond ChatGPT: Foundation models for research

Foundation models are complex AI applications with enormous potential. Trained on vast datasets from medicine, climate, or material science, they can uncover new relationships and make predictions. Helmholtz is supporting this pioneering work, with €5.55 million going to the Max Delbrück Center.

The term “foundation model” is haunting the scientific community, with a promise comparable to the global enthusiasm surrounding ChatGPT. That application, which is used millions of times daily, is a foundation model. But the Helmholtz Association is interested in more than just language. Foundation models are significantly more powerful and flexible than traditional AI-based models, making them suitable for research.

Trained with extensive datasets and supported by generative Artificial Intelligence (AI), they can understand complex relationships based on learned patterns, generate new connections, and make predictions. “We are convinced that we can push the boundaries of science with foundation models. Helmholtz brings together not only outstanding talent and comprehensive datasets from a variety of research fields but also a unique computing infrastructure,” says Professor Otmar Wiestler, president of the Helmholtz Association.

Together, we can really move the needle”

Since 2024, the Helmholtz Association has funded seven pilot projects and supported the necessary infrastructure through the Helmholtz Foundation Model Initiative (HFMI), with a total of around €28 million over three years. Of this, €5.55 million will go to the Max Delbrück Center, which has a significant role in three of the pilot projects. A “Synergy Unit” researches the overarching questions and supports knowledge dissemination, with the Max Delbrück Center taking the lead.

High Performance Computing (Helmholtz AI Compute Resources or HAICORE) is also being expanded for the initiative to enable low-threshold access to the necessary computing capacity. Along with the Karlsruhe Institute of Technology (KIT) and the Jülich Research Center, the Max Delbrück Center will join as a new host.

The funded pilot projects will not only offer clear value for science, but also to the public by making the results available as open source – from the code to the training data to the trained models. “The whole thing is based on a grassroots initiative by data scientists,” says Professor Dagmar Kainmüller, head of the “Integrative Imaging Data Sciences” lab at the Max Delbrück Center and a member of the four-person HFMI coordination committee. She is one of the effort’s initiators – and her enthusiasm is palpable. “We have the necessary quantities of data from very different research areas, we have the know-how, and we have the computing power. Together, we can really move the needle. We see it as our mission to make optimal use of the data collected within the Helmholtz Association.”

Synergy Unit: Benchmarking for new AI methods

Training of a foundation model takes place in several phases. First, it is fed vast quantities of well-prepared scientific data, initially without a specific task. The goal is for the system to independently acquire an extremely powerful knowledge base (the foundation). It can then be trained relatively easily in the next phase on targeted, downstream tasks.

The Synergy Unit, led by Kainmüller and coordinated by Dr. Eirini Kouskoumvekaki from the Max Delbrück Center, focuses on overarching questions relevant to all participating projects. Examples include scalability issues and training with the datasets. The key issue is how research into foundation models can be advanced as quickly as possible across disciplinary domains, and ensuring the initiative's long-term impact for the common good.

“We see unique potential in using large and extremely diverse datasets to compare new AI methods, identify what works particularly well under which conditions, and roll out foundation models across the Helmholtz Association,” says Kainmüller.

Research centers participating in the Synergy Unit: German Cancer Research Center, Helmholtz Munich, the Jülich Research Center, and the Max Delbrück Center.

Pilot project “VirtualCell:” Predicting complex cellular processes

2D map of gene expression in the metastatic lymph node

The vision of digitally recreating a cell — like a digital twin — has been around a long time. This would not only provide insights into complex cellular processes but also help make predictions. How do changes during the course of a disease affect cells? What effect does a drug have? The “VirtualCell” project aims to take on this challenge. It builds on recent advances in high-throughput genome sequencing and imaging, combining them with generative AI.

“Understanding the spatial relationships among cells in diseased tissues is crucial for deciphering the complex interactions that drive disease progression,” says Professor Nikolaus Rajewsky, director of the MDC-BIMSB. His lab has developed a platform called Open-ST, which analyzes tissue samples from patients with subcellular precision and creates virtual 3D tissue blocks. “Since the method is cost-effective and scalable, we can use it to include a wide range of tissues and contribute to the VirtualCell project,” says Rajewsky.

“VirtualCell” is trained on extensive single-cell multi-omics and spatial data. This creates a comprehensive model of cellular states and interactions in the foundation model. Researchers will apply it to novel clinical tasks by adapting it to samples collected by partners such as Charité – Universitätsmedizin Berlin. “VirtualCell” will advance cellular modeling to enable breakthroughs in clinical pathology, drug development and patient stratification, with the potential to significantly improve biomedical research and healthcare outcomes.

Centers participating in “VirtualCell:” The Jülich Research Center, Helmholtz Munich and the Max Delbrück Center.

Pilot project "Human Radiome Project:" Paving the way to a new era in radiology with AI

Healthy and diseased bones: On the left, the image shows the skull of a healthy control mouse (A, C) and on the right a mouse with multiple myeloma (B, D). Such preclinical data are crucial for developing an AI model for the “Human Radiome Project.”

Another example of potential applications is the “Human Radiome Project.” Its aim is to develop a model that will fundamentally change medical imaging. With a dataset of 4.8 million 3D images – more than 500 terabytes of CT and MRI scans from Germany and England – scientists are training an AI to analyze medical images much faster and with greater precision than ever before. “This project could become as significant for radiology as the Human Genome Project was for genetics,” says Dr. Arnd Heuser of the Max Delbrück Center.

This will pave the way to a major advance in precision medicine as well: Diagnoses can be made faster and more accurately – from early tumor detection to brain disease analysis. Personalized risk assessment will improve as well: Thanks to the model, the cancer risk of individual patients can be better assessed based on their data. Medical research will also benefit. The focus at the Max Delbrück Center is on adapting the AI model for preclinical applications. “Applying the model to animal experiments opens up new perspectives,” explains Heuser. “If we can significantly optimize the evaluation of complex preclinical image data, not only will research become more efficient, but it will also accelerate the translation of scientific findings into the clinic.” The potential of such AI models goes far beyond diagnostics and could fundamentally change how we research and treat diseases.

Helmholtz Centers participating in the “Human Radiome Project:” German Cancer Research Center, German Center for Neurodegenerative Diseases and the Max Delbrück Center.

Pilot project “AqQua:” How plankton affects the climate

Every day, researchers observe life in our oceans using a variety of devices.

Aquatic life plays a critical role in the Earth's climate. Plankton in particular sequester large amounts of carbon from the atmosphere. Climate change is damaging plankton ecosystems, affecting carbon export and marine food resources. "It’s alarming how little we know about the distribution and abundance of individual plankton species and how uncertain the estimates are of how much carbon they can absorb,” says Dagmar Kainmüller, co-spokesperson of the “AqQua” project.

Every day, researchers around the globe use a variety of devices to capture millions of plankton images in the ocean. This distributed pelagic imaging enables comprehensive observation of aquatic life right down to the ocean floor in the deep sea. “AqQua” will combine billions of such images to develop the first foundational pelagic imaging model. In addition to the Helmholtz centers, 40 partner institutions worldwide are involved in the project. “It offers a unique opportunity to push for operational monitoring of plankton biodiversity, ecosystem health and carbon flux globally,” says Kainmüller. “I am very excited about the concerted effort we can drive forward with this fantastic interdisciplinary consortium.” AqQua will support decision-making in times of global change, especially in connection with new technologies for removing carbon dioxide from the atmosphere.

Centers participating in "AqQua:" The Jülich Research Center, GEOMAR, HEREON and the Max Delbrück Center.

 

Further information