AI and science unite at first HFMI workshop
As part of the Helmholtz Foundation Model Initiative (HFMI), 150 leading minds at the intersection of artificial intelligence and scientific research met in Berlin to further the transformative potential of AI across multiple scientific disciplines. The workshop was organized by HFMI, which has earmarked €28 million over three years toward developing foundation models, and the European Laboratory for Learning and Intelligent Systems (ELLIS), a pan-European AI network, and supported by additional partners.
In an interview, Professor Dagmar Kainmüller, head of the Integrative Imaging Data Sciences lab at the Max Delbrück Center, discusses how the workshop furthers the goals of HFMI and explains the nature of her work. Kainmüller co-leads the HFMI Synergy Unit, which coordinates efforts to build long-term and synergistic foundation model expertise, and organized the workshop. The Max Delbrück Center is participating in the HFMI and received €5.5 million of the HFMI funding.
The workshop featured a star lineup of speakers, who included experts from Meta, Microsoft, ETH Zurich, Simons Foundation, ELLIS, and other prestigious institutions. Professor Stefan Bauer, data scientist at Helmholtz AI and a member of the Synergy Unit, played a key role in organizing the speakers.
Dagmar Kainmueller and Uwe Ohler (left) during the HFMI workshop.
Professor Kainmüller, what was the goal of the workshop?
Dagmar Kainmüller: We wanted to bring the global community working on foundation models for science together to strengthen it, to grow it, and to brainstorm the next steps, and also perhaps come up with some more outlandish future directions.
The workshop presenters came from diverse fields that included gaming and astrophysics. What were the takeaways for you personally?
We wanted the meeting to be interdisciplinary on purpose because there are often synergies that you only discover by being exposed to different disciplines. Anna Scaife from the University of Manchester gave a talk on Foundational Representations for Astrophysics for example. She is facing the same challenges that we are in one of my current projects looking at plankton data. Their image data calls for modifications to the AI training paradigm. We found exactly what they found. We exchanged ideas, and maybe we will collaborate. Such cross-domain exchange can be so crucial to further developing these models.
What were some of the highlights from the workshop?
What I particularly loved about this meeting is that people were solution-oriented toward advancing science – this as opposed to outperforming a previous model on some benchmark that means nothing to the scientists. At the usual AI conferences, you get lots of the latter category. Lucas Beyer from OpenAI for example, showed that a considerable chunk of the AI community has been horse racing towards a questionable objective – meaning they have been targeting one very specific, very narrow-scoped benchmark goal of the computer vision community. It was really refreshing to have this workshop with so many people who oppose this, even though it's hard. We heard that from Beyer. We heard the same thing from Dr. Lena Maier-Hein from the German Cancer Research Center (Deutsches Krebsforschungszentrum) and University of Heidelberg. She had a paper that received a negative review by a peer reviewer because it highlighted cheating in the field.
AI scientists expand their knowledge at the HFMI workshop to improve artificial intelligence.
Have you gotten any feedback from participants?
I think we hit the perfect spot in the number of participants. We wanted to structure the workshop so there would be active discussion with the audience. I was super happy to see that this emerged and that we had many interactive discussions, which is not the standard at large meetings. Many participants were comfortable enough to join in. And that was specifically something we wanted to achieve.
The idea of foundation models can feel very abstract to a non- specialist. Can you give us some background. What is a foundation model as compared to say Generating Pre-Training Transformer, more commonly known as GPT?
GPT is a foundation model because it can do many different so-called downstream tasks. For example, it can chat with you or write code for you, or do your math homework.
What is an example of a foundation model used by Max Delbrück researchers that is trained to do specific tasks?
Uwe Ohler, Head of the Computational Genomics lab at Max Delbrück Center for example, investigates foundation models to analyze genomic data to predict what the regulatory elements in the sequence might be. That would be an example of a task that it can do. And that foundation model can also do other tasks using that same data.
Do you use the same foundation model in your work?
No, we call them domain-specific foundation models. There is no one big model for all of molecular cell biology, at least not yet. For imaging data, we have a variety of foundation models because there are many different domains. There are foundation models for pathology data, for example, or for multiplexed imaging data. We want these models to be of immediate benefit for the domain researchers and enable them to pursue new kinds of studies.
What are some of the challenges to building domain specific foundation models?
Getting the necessary data is a big one. Even if collaborators are eager to share their data, it is still very hard to get it all. In some cases, there are data privacy issues. But even for publicly available data, it can often not just simply be downloaded because such massive traffic would kill the service. And then the data needs to be curated – which means faulty data needs to be weeded out before using it to train models. These are the very down-to-earth bottlenecks that we're facing right now. But once you have a well-curated data set, then everybody would agree that we are pretty good at solving the AI part.
Smart minds through collected data: A talk at the HFMI workshop.
Can you cite another challenge?
Some data science experts aim for objectives that are not aligned with scientists’ needs because their goal gets detached from the domain community – the community that is supposed to actually use the model. They spend a lot of resources optimizing their models but oftentimes this is not meaningful for the domain scientist. Defining objectives to be of maximum relevance for the domain scientist is a super critical step.
That issue came up during the workshop, correct?
Yes, Sarath Chandar from Mila Polytechnique in Montreal said that he’s skeptical of papers on foundation models in biomedical science if the author list does not include a deep learning expert and an expert in biology. I agree because the domain specialists are the ones who know best what they want to do.
In the future, what are you most excited about?
I feel privileged to work with such amazing colleagues in the HFMI Synergy Unit, in our Pilot Project AqQua, and in the HFMI community as a whole. An extra special shout-out to Stefan Bauer (HFMI Synergy Unit and Helmholtz Munich), who assembled the stellar lineup of speakers we had in our workshop. Everyone I have the pleasure of working and connecting with in HFMI is in it for the cause, striving to advance scientific discovery, which makes our collaborative work both highly effective and highly enjoyable. I am learning so much in HFMI — exciting new methodology, fantastic applications, cool engineering tricks. I very much look forward to continuing!
Interview: Gunjan Sinha
Further information
- HFMI workshop agenda and list of speakers
- HFMI
- Beyond ChatGPT: Foundation models for research
- The Helmholtz-ELLIS Workshop Explored the Role of Foundation Models Across Diverse Research Fields
- Helmholtz-ELLIS Workshop: Trends in AI Research