Dr. Laleh Haghverdi finds the intersection of mathematics and biology wildly fascinating. “I’m in the middle,” says Haghverdi. “This exchange makes everything more exciting and lively.”
She is especially excited to join the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), where she can collaborate with other researchers from both sides of the equation. Her new group will be part of a single-cell cluster at the Berlin Institute for Medical Systems Biology (BIMSB), and focus on improving integration and analysis of different sources or types of single-cell data.
Seeing data differently
Computer algorithms are needed to analyze the large datasets produced by single-cell sequencing, but Haghverdi doesn’t start her computations there. Rather, her background in math and physics enables her to find mathematical ideas and adapt them to the biological questions at hand.
For example, as a mathematics PhD student at the Technical University of Munich, she was looking at RNA sequences from single cells and thinking about how stem cells move through the differentiation process to become, for example, a muscle cell or a heart cell. The differentiation process, with key branching events, reminded her a lot of “random walks” and “diffusions,” concepts in math that are defined with equations. She realized the equations for “diffusion maps” could be tweaked for single-cell transcriptomics data – instead of the distance between points in space, it can measure the pseudo-time between phases of cell differentiation. The result: colorful visualizations that show cells at different points in the differentiation process.
She compares the approach to drawing a street map by observing where the cars go. Then, by assessing the popular routes, she can use the map to predict where a future car – or in this case, a cell – will start and finish. However, much like driving, there can be more than one route. The diffusion map evaluates all possible routes between the differentiation phases. “This makes it very robust to noise, and this is very important in single-cell data where there are a lot of zeros, a lot of noise,” Haghverdi says.
Since publishing this work, which was awarded the Erwin Schrödinger Prize for interdisciplinary research in 2017, sequencing data has grown exponentially. Whole organisms, such as zebrafish, have been sequenced throughout the development process at the single-cell level. As part of her new MDC group, Haghverdi plans to expand the diffusion map concept to handle this more complex landscape; helping improve the resolution of “cell lineage trees” that trace a cell’s history back to its founding stem cell group.
She adapted another math tool, “mutual nearest neighbor,” to integrate data from different data sets. In mutual nearest neighbor, cell types are identified depending on who they group with, so muscle cells are most likely near other muscle cells, and brain cells are grouped with other brain cells. She then takes multiple layers of data and lines them up on parallel hyperplanes – the parts that match will be lined up, one above the other. It allows users to combine datasets from multiple experiments into one calibrated set. They can also use a dataset with pre-labeled cell types to transfer labels to a new dataset.
After Haghverdi and her collaborators released an R program package for mutual nearest neighbor, it was quickly picked up and built on by other labs. “It is very encouraging, you feel that you can bring something useful to the field,” she says.
She plans to extend the concept to combine different types of data which may not be instantly correlated with each other, such as transcriptomics, mutations and metabolomics, so they can be analyzed together. This is a key part of her work in a new German Ministry of Research and Education (BMBF ) Junior Consortium with collaborators she met as a postdoc at the European Molecular Biology Laboratory in Heidelberg. The consortium, which includes BIH /Charité in Berlin, the Centre for Genomic Regulation in Barcelona, and Heidelberg University Hospital, will study leukemia stem cells using multi-omics data. To better answer questions about the disease, “we want to solve this problem of how all of this data can be interpreted all together,” Haghverdi says.
First love: physics
Growing up in Karaj, Iran, a town 40 kilometers outside of Tehran, Haghverdi attended a high school specializing in science and technology. Students had to pass an exam to attend, but fees were on a sliding scale based on ability to pay. While she enjoyed all of her science courses, physics was her absolute favorite. “Physics is about connecting theory with practice,” she says. “The things you see around you, you try to get a fundamental understanding of it, that is usually connected with mathematics. It’s exactly what I’m trying to do right now.”
After attending college in Iran, she decided to head to Europe for a career in research. She studied physics at the University of Cologne for her master’s degree. But even then, she chose to study population genetics because she was curious to learn more about evolution. Ever since, she has appreciated how much math and biology can benefit each other. “I think there is a lot of room for collaboration,” she says. “From this exchange, I think new ideas can be very efficiently built up.”
Author: Laura Petersen