Alina Bazarova: Multi-modal Integration for Biological Tasks: Perks, Caveats and Applications
Lecture Series:
SysBio Lecture Series: AI for Systems Medicine
Abstract
In this talk, Alina will present OneProt, a versatile artificial intelligence framework for protein analysis that leverages multi-modal integration across structural, sequence, textual, and binding-site data.
To align these heterogeneous modalities, OneProt adopts an ImageBind-inspired training strategy, enabling efficient cross-modal representation learning without requiring fully paired data. By combining graph neural networks and transformer-based architectures, OneProt achieves strong performance across tasks such as enzyme function prediction and binding-site analysis. Alina will highlight two key features of the framework: its ability to seamlessly incorporate custom modalities during pre-training, and a lightweight fine-tuning strategy that relies only on a simple multi-layer perceptron projection. Through empirical results, she will demonstrate how multi-modal integration can reduce the reliance on large task-specific datasets while maintaining competitive downstream performance. Alongside these benefits, Alina will discuss the practical challenges and caveats of adding new modalities, including alignment noise, modality imbalance, and training stability. Finally, she will present preliminary results from a follow-up project, OneProtGPT, which integrates OneProt with scientific large language models to enable cross-modal retrieval and the integration of protein representations with natural language.
Biography
After completing her PhD in Probability Theory and Mathematical Statistics at Graz University of Technology, Alina Bazarova transitioned into biological applications, first developing Bayesian inference algorithms to study DNA replication at the University of Warwick and later tackling clinical prediction problems at the University of Birmingham. Alina then worked at the University of Cologne on phylogenetic inference for influenza and COVID-19 transmission, contributing to analyses used to inform vaccine-strain decisions by the WHO.
Currently, at the Jülich Supercomputing Centre, Forschungszentrum Jülich, Alina applies high-performance computing and machine learning to major biological challenges. Her research focuses on advancing AI systems for protein analysis and design, leading efforts in the development of multimodal models and their downstream and translational applications.
Read more about her research here.
Venue
MDC (BIMSB)
Hannoversche Straße 28
Room 0.61 & online via Zoom
10115 Berlin
Germany
Time
Organizers
Melissa Birol
Markus Mittnenzweig
Dagmar Kainmüller
Uwe Ohler
Jana Wolf
Lisa Buchauer
Grégoire Montavon
Christoph Lippert