Algorithms help navigate through the data thicket of biological systems

If one looks beyond individual molecules, biology quickly becomes complex. All life processes seem to affect one another. The computer scientist Prof. Uwe Ohler and his team of researchers at the Max Delbrück Center for Molecular Medicine  in the Helmholtz Association (MDC) are using algorithms and mathematical models to map a path through this tangle of dependencies. 

Life consists of myriad relationships on various levels: Ecosystems, individuals, organs, cells, and individual molecules all interact with one another. Many of these relationships cannot be described adequately without using formulas and equations, which is why the modern life sciences are closely linked to physics, computer science, and mathematics.

Uwe Ohler is studying the complex interactions between genes

Prof. Uwe Ohler. Image: David Ausserhofer/MDC

Systems biology takes this to the extreme. It aims to understand all life processes and their interrelationships in their entirety – which would be impossible without mathematics and computer science, as Prof. Uwe Ohler, a systems biologist and computer scientist at the MDC, acknowledges: “Although my team conducts laboratory experiments, two-thirds of our work involves computer-based calculations.”

Ohler specializes precisely in those genes that control all life processes: They are read from the DNA code and translated into molecules, which in turn regulate other genes that then serve, for example, as neurotransmitters or building material. During embryonic development they determine such things as whether a stem cell develops into a neuron or a muscle cell. This allows them to regulate the structure of all organs, the entire organism, and also behavior.

Ohler and his team want to find out how and when which genes are activated and how they influence one another. The researchers are looking at several stages of gene regulation.

Genes are muted in tightly wrapped DNA

Whether a gene is active or not – that is, whether its information can be read in a process called transcription – depends on whether very small molecular copying machines ever reach the genetic material in question. If the DNA is tightly coiled, these machines – known as RNA polymerases – cannot work.

A glimpse into the nucleus of a fish cell. Here there are no clearly defined chromosomes. Lighter, active regions of the chromatin are interspersed with darker, inactive regions. Photo: T. Voekler, CC-BY-SA

Under the microscope one can see that inside the cell’s nucleus the DNA is coiled into a convoluted ball of thin threads. In the lighter regions of this structure, the chromatin, the DNA is less tightly coiled. Here there is enough room for regulation proteins – so-called transcription factors. They bind to the DNA and guide the RNA polymerase to the gene. Information in the form of RNA molecules is then dispatched from these regions.

A collaboration with New York University

An article by Ohler recently published in the journal Cell Stem Cell shows the critical impact of the chromatin state on how individual genes interact. The researchers looked closely at processes in stem cells that are programmed to become neurons.

“Esteban Mazzoni, a professor at New York University, discovered several years ago that stem cells can be efficiently programmed into  particular types of neurons. One only has to artificially activate three genes, and in two days the stem cell will be transformed into a motor neuron. But we did not know exactly what occurs in the cells. And that’s precisely what the collaboration sought to find out,” is how Mahmoud M. Ibrahim, a member of Ohler’s research team, describes the project.

As a PhD student in the MDC-NYU Exchange Program, Ibrahim works closely with Esteban Mazzoni’s team and traveled to New York several times for research stays. While Mazzoni’s team was responsible for the experimental portion of the research, Ibrahim and another partner from Penn State University analyzed the data that was collected.

How to make a motor neuron

“We collected various types of data – for example, data on chromatin states, on the different regulating signals, and on when genes were activated – at several time points over the course of the programming process,” reports Mahmoud M. Ibrahim.

To make sense of this maze of data, he put the data sets into a Bayesian network model, a mathematical structure that one can use for machine learning. The computer then autonomously sorts the time-dependent data into classes of genes, thereby making visible the relationships between the genes.

Featured image: Motor neurons induced by direct programming of mouse embryonic stem cells. Image: Esteban Mazzoni, NYU. License: CC-BY-NC-ND-2.0

This approach allowed the researchers to discover that the transformation of stem cells into neurons triggers several independent processes that eventually converge. Only three transcription factors are needed to set in motion a coordinated cascade of multiple complex events. First, the chromatin was loosened in certain regions, and then specific gene programs were activated in these sites. These ultimately decided the fate of the cell.

Researchers would not have been able to make this discovery without the powerful statistical mathematical tool of machine learning. Ibrahim says, “The unique thing about our research is that we merged several data sets from different times. This provided us with a detailed overview of the time-dependent changes in the cell.”

Distinguishing meaningful events from coincidences

Integrating data also helps to differentiate between genuine cellular signals and those that are randomly generated. This topic was addressed in an article by Uwe Ohler’s working group recently published in Nature Structural and Molecular Biology. It sometimes happens that the transcription machines read random DNA sequences in loosened chromatin. The RNA molecules generated in such situations produce background noise in the cell that can interfere with the “genuine” RNA signals.

To separate the wheat from the chaff, Ohler’s team brought together data on the production, processing, breakdown, and transport of RNAs in the cell. The team of researchers again organized the information collected using algorithms, enabling them to classify the RNA molecules.

Most RNAs carry the code for a protein, though in some cases they regulate other genes. The analysis showed, however, that many of the non-coded RNA molecules do not have any function at all. A large number of these RNAs break down immediately or never leave the cell’s nucleus. This also means that although some of the DNA sequences considered a “gene” are correlated with biological processes, the transcripts themselves lack any other function. These RNAs are possibly just a natural side effect of transcription, one which is neither beneficial nor harmful. After all, not everything that happens in nature has a meaning.

“Another possible interpretation of these findings is that evolution is a steadily advancing process, and that the human race is still far from reaching the end of its evolutionary development. Genes can adopt new functions or lose functions. Many things happen in our bodies that produce no significant benefit or harm but are part of the natural processes of evolution,” says Uwe Ohler.

Biology seems so complex because everything is interconnected with everything else. Organizing and understanding life’s countless relationships is the focus of Uwe Ohler’s work.

Further information

Silvia Velasco,1,7 Mahmoud M. Ibrahim,2,3,7 Akshay Kakumanu,4,7 Görkem Garipler,1 Begüm Aydin,1
Mohamed Ahmed Al-Sayegh,1,5 Antje Hirsekorn,3 Farah Abdul-Rahman,1 Rahul Satija,6 Uwe Ohler,2,3,8 Shaun Mahony,4,8 Esteban O. Mazzoni1,8 (2016): „A Multi-step Transcriptional and Chromatin State Cascade Underlies Motor Neuron Programming from Embryonic Stem Cells.“ Cell Stem Cell 20. doi:j.stem.2016.11.006

1Department of Biology, New York University, New York, USA; 2Department of Biology, Humboldt Universität zu Berlin; 3Berlin Institute for Medical Systems Biology, Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft; 4Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, Penn State University, USA; 5Division of Science and Math, New York University, Abu-Dhabi, UAE
; 6New York Genome Center, New York University, New York, USA

7first coauthors; 8corresponding coauthors.

Neelanjan Mukherjee1, Lorenzo Calviello1,2, Antje Hirsekorn1, Stefano de Pretis3, Mattia Pelizzola3 & Uwe Ohler1,2,4 (2017): „Integrative classification of human coding and noncoding genes through RNA metabolism profiles.“ Nature Structural and Molecular Biology 24(1). doi: 10.1038/nsmb.3325

1Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany; 2Department of Biology, Humboldt University, Berlin, Germany; 3Center for Genomic Science of IIT@SEMM, Fondazione Instituto Italiano di Tecnologia, Milan, Italy; 4Department of Computer Science, Humboldt University, Berlin, Germany