Meyer Lab Header

Meyer Lab

RNA Structure and Transcriptome Regulation

News

Our goal is to discover new mechanisms of gene regulation that are mediated by RNA structure or by trans RNA-RNA interactions. For this, we develop dedicated computational methods with unique features.

The computational methods we devise employ machine learning techniques that are capable of detecting even subtle sequence (and other) signals in high-throughput transcriptome data. We typically employ fully probabilistic methods that enable us to assign reliability values to our predictions.

On the biological side, we are studying transcriptome regulation in a variety of exciting biological systems in vivo ranging from early human embryogenesis to neurogenesis in the fruit fly to how pathogens interact with different hosts. To this end, we closely collaborate with several experimental groups on and off campus.

News:

Check out the new location of the new BIMSB building in the centre of Berlin, our new home from February 2019

Our new manuscript on the role of miRNAs in the early fly neurogenesis has just been accepted for publication in RNA Biology. This work is based on our new collaboration with Robert Zinzen's fruit fly group here at the BIMSB-MDC. See also the recent pre-print investigatin the role of lncRNAs in fly neurogenesis which is now available on bioRxiv.

 

Check out the latest bioRxiv pre-print on influenza A infection and how its species-specifity it regulated via M segment splicing. This is joint research with the proteomics group of Matthias Selbach at the MDC and the influenza A lab of Thorsten Wolff of the Robert Koch Institute in Berlin. Interestingly, human-derived and avian-derived influenza A genomes have evolved quite different RNA structure elements overlapping the decisive 3' splice site.

Our review of experimental and computational methods for probing the RNA structurome and trans RNA-RNA interactome in vivo is now freely available online. Enjoy.

Team

Overview

Bioinformatics of RNA structure and transcriptome regulation

Overview and introduction

Figure: Conserved RNA structure with corresponding multiple sequence alignment (top) overlapping the splice site of a fruit fly gene (bottom).
This structure may regulate the alternative splicing of the gene via structural changes induced by RNA editing. Red arrows highlight RNA editing sites.

RNA transcripts are the primary products of activated genes. They yield as products proteins and functional RNAs which constitute key players in all living orgamisms. Yet, how the transcriptome is regulated on RNA level to yield these products remains surprisingly underexplored.

We are particularly interested in mechanisms of gene regulation that are mediated by RNA structure features and by trans RNA-RNA interactions. Both are difficult to probe on a transcriptome-wide scale and in vivo using experimental methods, but exciting progress has been made recently (SHAPE, PARIS, LIGR-seq protocols, all 2016). We have contributed a range of often unique computational methods and analysis pipelines that have allowed us detect a variety of functional RNA features in silico due to their evolutionary conservation and based only on sequence information (e.g. RNA-seq data). On our computational sides, this typically requires dedicated methods that employ sophisticated, probabilistic methods for detecting RNA structure features, trans RNA-RNA interactions and other evolutionarily conserved signals and for testing detailed hypotheses about the underlying molecular mechanisms.

Ongoing projects and collaborations

Since starting here at the MDC in 2016, we have embarked on the following projects and started several exciting collaborations with experimental group on- and off-campus:

  • Together with the experimental fruit-fly group of Robert Zinzen here at the BIMSB-MDC, we have started investigating the role of specific trans RNA-RNA interactions during early fly neurogenesis. This extends our earlier Bioinformatics research where we show that A-to-I editing in the fruit fly induced changes of local RNA structure features around splice sites that yield splice variants that are specific to cells of the central nervous systems.
  • Together with the experimental group of Zsuzsanna Izsvak here at the MDC, we have started investigating novel biological classes of functionally important trans RNA-RNA interactions in human embryonic stem cells. This project is particularly exciting as we will look at transcriptome-wide data at sub-cellular resolution.
  • Detecting truly novel classes of trans RNA-RNA interactions based on sequence information alone remains computationally and conceptually extremely challenging, see the following recent papers by us and others for more information. We are developing new probabilistic methods for predicting trans RNA-RNA interactions de novo that overcome these challenges and that can also be applied on a transcriptome-wide scale and in a eukaryotic setting.
  • A number of exciting, experimental methods for probing the RNA structurome and RNA-RNA interactome in vivo and on a transcriptome-wide scale have recently emerged. As our recent book chapter explains, these require sophisticated computational pipelines to assemble, map and interpret the raw experimental data. We continue to work on new computational methods that aim to combine the best in terms of experimental in-vivo probing with the state-of-art in computational methods in order to get a transcriptome-wide view of RNA structures and trans RNA-RNA interactions in vivo.
  • By now, there is significant evidence that RNA transcripts can express different functional RNA structures in vivo, depending on the specific details of their cellular environment. We introduced this concept as alternative RNA structure expression. Right now, however, both experimental and computational methods for investigating RNA structure features in a high-throughput manner in vivo have conceptual challenges capturing this RNA structure heterogeneity, see our recent review for details. This is a challenge that we hope to address with new computational methods (as well as fresh ideas for novel experimental protocols).
  • Viruses are wizards at readily combining many, sometimes overlapping signals into their compact genomes. We continue our interest in understanding virus regulation. Together with the proteomics group of Matthias Selbach at the MDC and the virology group of Thomas Wolff a the Robert-Koch-Institute here in Berlin, we have started to analyse mechanisms of transcriptome regulation that play a decisive role in determining the species-specificity of different influenza strains. Check out our pre-print and the prominent role that M segment splicing and conserved RNA structure elements play in species-specific infections.

Publications

Research

Bioinformatics of RNA structure and transcriptome regulation

Introduction and Lay Summary

When the human genome sequence was released more than a decade ago, it came as a surprise to many that the number of protein-coding genes was not radically different from the corresponding gene count of the seemingly more humble nematode Caenorhabditis elegans (C. elegans). The current gene counts (20313 for human (GRCh38.p5) versus 20447 for C. elegans (WBcel235)) are stunningly similar. The gene count itself is thus only a poor measure for the complexity of the corresponding organism.

Another surprise finding in the wake of the human genome sequencing project was the realisation that only a small fraction of the genome (<2%) actually encodes protein information. Moreover, many genes seem not encode any protein product at all (25180 so-called RNA genes (GRCh38.p5)). Moreover, even the primary transcripts of protein-coding genes contain a seemingly disproportionate fraction of non-coding nucleotides (introns and untranslated regions).

The primary products of all activated genes are transcripts (RNA sequences). The functional products of these transcripts are proteins as well as functional RNAs which constitute key, cellular players in any organism. How and when any of these products are generated is a fine-tuned process that e.g. depends on the tissue-type and developmental trajectory of each individual cell. As the functional products define the current state of each cell (whether this is a state of disease or health), it is of key importance to
understand how the different functional products of the transcriptome are made. Without this knowledge, we not only lack information on why certain products are made, but also have no means of correcting for erroneously produced products if the cell is an a state of disease. Somewhat suprisingly, however, the molecular mechanisms underlying transcriptome regulation remain largely underexplored.

We hypothesize that RNA structure features and trans RNA-RNA interactions between two different transcripts play decisive functional roles in regulating gene expression on transcriptome levels. To this end, we devise new computational methods that allow us to discover new mechanisms of transcriptome regulation based on sequence information alone (e.g. RNA-seq transcriptome data). Due to the size of today's transcriptome data sets, we can even detect subtle mechanisms of transcriptome regulation with significant statistical evidence that would be hard or impossible to detect using the best experimental methods, see our recent analysis of A-to-I RNA editing in the fruit-fly as one example.

Beyond the one-dimensional view of transcripts

More often than not, figures in textbooks or on educational web-pages illustrate the Central Dogma of Biology by depicting transcripts as linear or wavy sticks inside a eukaryotic cell, with transcription and splicing seemingly happening consecutively. What we know from many dedicated experiments, however, is that processes that alter the primary transcripts (e.g. splicing, RNA editing and RNA structure formation) happen co-transcriptionally, i.e. while the RNA sequence is being transcribed from the genome. Similarly to protein information, information on RNA structure or potential trans RNA-RNA interaction partners can be directly encoded in the transcript itself. This makes it evolutionarily robust as any regulatory signals are directly encoded in the sequence itself. We thus expect that RNA structural features and RNA-RNA interactions are widely used for regulating gene expression on transcript level.

Modelling RNA structures in vivo

In order to devise computational methods for detecting the RNA structural features that are functionally relevant in vivo, it is worth acknowledging the complexity of the cellular environment and the impact this may have on the structure formation process, see our review paper. By devising the new RNA secondary structure prediction program CoFold, we showed that it is possible to capture the overall effects of the speed and directionality of transcription in vivo and also confirmed an earlier, long-standing hypothesis by Morgan and Higgs from 1996. Our method yields significantly improved predictions, especially for long transcripts (> 200 nt) such as ribosomal RNAs. We know already from one of our earlier, in silico studies that the sequences of structured RNAs not only encode information on their final RNA structure, but also on how these RNAs fold in vivo during co-transcriptional folding.

Figure 1: Arc-plot for the HDV ribozyme made using R-Chie. Each arc represents one pair of base-paired alignment columns. Arcs and the alignment at the top show the alternative structure and the active structure; those at the bottom the inhibitory alternative structure. The left legend specifies the percentage of canonical base-pairs for each arc. The right legend colour-codes the nucleotides and specifies the evolutionary evidence supporting each arc.

It turns out that orthologous transcripts from related organisms also have similar co-transcriptional folding pathways and that distinct transient RNA structure features can be as conserved and functionally relevant as those of the final RNA structure, see [1], [2] and [3]. This has significant implications for many state-of-the-art methods in RNA secondary structure prediction as these typically assume that any given transcript folds into exactly one functional RNA structure. A probabilistic method called Transat developed earlier by us aims to address this problem and has allowed us to detect individual, conserved RNA secondary structure features of pseudo-knotted structures, ribo-switches and transient structures which are otherwise notoriously difficult to predict.

RNA structure features involved in splicing regulation

Figure 2:
(A) Genomic context of identified editing sites.
(B) Distribution of conversion types for four tissue types.
(C) Percentage of common editing sites between pairs of tissues.
(Bottom) Gene CG5850 is differentially expressed between head (blue) and digestive system (red) and editing and splicing may affect each other. X-axis: exons of the gene, y-axis: number of reads normalized by library size. Arrows show editing sites. The purple box is predicted to be alternatively expressed.

Viral genomes such as Hepatis-C and HIV-1 are known to encode functional RNA structure in protein-coding regions as one major constraint for their genomes it to remain short. We contributed early on to these studies by showing that these RNA structures can be reliably predicted provided the know protein context is explicitly taken into account, see [1], [2] by us and also [3]. Functional RNA structures overlapping protein-coding regions, however, are not the preserve of viral genomes, but can also regulate the alternative splicing and translation of eukaryotic protein-coding genes e.g. in Arabidopsis thaliana and mouse and human. In order to explore the link between RNA structure and alternative splicing on a transcriptome-wide scale, we recently analysed tissue-specific high-throughput transcriptome data from the fruit fly. Using a new, probabilistic analysis pipeline that explicitly captures the ADAR-requirement for double-stranded regions, we identified around 2000 novel editing sites as well as more than 200 regions where local RNA structure changes due to A-to-I RNA editing are likely to induce corresponding changes in the splicing pattern, see our paper for details.

Figure 3:
(Top) Arc-plot for the highlighted region of the Cip4 gene containing a predicted, conserved RNA secondary structure overlapping RNA editing sites (red arrows) that could influence alternative splicing via structural changes. The left legend colour-codes the nucleotides according to the evidence supporting each arc, see also Figure 1. Figure made using R-Chie. (Bottom) Gene structure of the Cip4 gene with grey box highlighting the structure-containing part at the top.

Trans RNA-RNA interactions regulating the transcriptome

RNAs not only have the potential to form RNA structure, but can also interact with other RNAs in trans. These trans-interactions involve the same simple structural building blocks as RNA structure features, i.e. hydrogen bonds and stacking interactions involving pairs of complementary nucleotides ({G,C}, {A,U} and {G, U}). In terms of evolution, it is much more straightforward to evolve a specific trans RNA-RNA interaction than to come up with a (properly folded) protein that would engange in a similarly specific protein-RNA interaction. We therefore hypothesize that many novel biological classes of trans RNA-RNA interactions (beyond the already well-known classes such as miRNA-mRNA and snoRNA-rRNA) remain to be discovered. We have shown in a range of settings how of the comparative, in silico approach can be harnessed to significantly improve upon existing state-of-the-art methods. We thus continue to develop new, computational methods that allow us to make discoveries that would otherwise be difficult to make. To this end, we also collaborate with dedicated experimental groups that allow us to generate large-scale transcriptome data set (which constitute the input to our methods) and that test our high-ranking predictions in dedicated follow-up experiments.

Jobs

We are seeking highly motivated and enthusiastic members to join our team.

 

Interested in joining the lab as a postdoc?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV with a list of publications
  • A short summary of your present and future research interests
  • Scan of your PhD certificate
  • Names of three referees

Potential candidates who are interested in joining us should be competitive to also apply for external fellowships and sources of funding such as:

Interested in joining the lab as a graduate student?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV
  • A short summary of your present and future research interests
  • Scan of your University certificates
  • Names of two or three referees

In addition, please also consider applying via the MDC graduate school.

Our past and present graduate students have come from diverse scientific backgrounds ranging from computer science, bioinformatics, physics, mathematics, statistics to bioengineering. We are interested in enthusiastic candidates with a strong interest in computational method development (C++, Java, R) that are also keen to study exciting biological systems using high-throughput transcriptome data.

Interested in joining the lab as an intern, undergraduate or Master student?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV
  • A short summary of your present and future research interests
  • Scan of your University certificates
  • Names of two or three referees

Our past and present students have come from diverse scientific backgrounds ranging from computer science, bioinformatics, physics, mathematics, statistics to bioengineering.