Meyer Lab Header

Meyer Lab

RNA Structure and Transcriptome Regulation

News

Our goal is to discover new mechanisms of gene regulation that are mediated by RNA structure or by trans RNA-RNA interactions. For this, we develop dedicated computational methods with unique features.

The computational methods we devise employ machine learning techniques that are capable of detecting even subtle sequence (and other) signals in high-throughput transcriptome data. We typically employ fully probabilistic methods that enable us to assign reliability values to our predictions.

On the biological side, we are studying transcriptome regulation in a variety of exciting biological systems in vivo ranging from early human embryogenesis to neurogenesis in the fruit fly to how pathogens interact with different hosts. To this end, we closely collaborate with several experimental groups on and off campus.

News:

Welcome to our new group members joining us in March 2019: Dr. Ana Bergues Pupo, Simo Mounir and Dr. Adrian Lopez Martin

Check out the new location of the new BIMSB building in the centre of Berlin, our new home from April 2019

 

Our new manuscript on the role of miRNAs in the early fly neurogenesis has just been accepted for publication in RNA Biology. This work is based on our new collaboration with Robert Zinzen's fruit fly group here at the BIMSB-MDC. See also the recent pre-print investigatin the role of lncRNAs in fly neurogenesis which is now available on bioRxiv.

 

Check out the latest bioRxiv pre-print on influenza A infection and how its species-specifity it regulated via M segment splicing. This is joint research with the proteomics group of Matthias Selbach at the MDC and the influenza A lab of Thorsten Wolff of the Robert Koch Institute in Berlin. Interestingly, human-derived and avian-derived influenza A genomes have evolved quite different RNA structure elements overlapping the decisive 3' splice site.

 

Our review of experimental and computational methods for probing the RNA structurome and trans RNA-RNA interactome in vivo is now freely available online. Enjoy.

Team

Overview

Bioinformatics of RNA structure and transcriptome regulation

Overview and introduction

Figure: Conserved RNA structure with corresponding multiple sequence alignment (top) overlapping the splice site of a fruit fly gene (bottom).
This structure may regulate the alternative splicing of the gene via structural changes induced by RNA editing. Red arrows highlight RNA editing sites.

RNA transcripts are the primary products of activated genes. They yield as products proteins and functional RNAs which constitute key players in all living orgamisms. Yet, how the transcriptome is regulated on RNA level to yield these products remains surprisingly underexplored.

We are particularly interested in mechanisms of gene regulation that are mediated by RNA structure features and by trans RNA-RNA interactions. Both are difficult to probe on a transcriptome-wide scale and in vivo using experimental methods, but exciting progress has been made recently (SHAPE, PARIS, LIGR-seq protocols, all 2016). We have contributed a range of often unique computational methods and analysis pipelines that have allowed us detect a variety of functional RNA features in silico due to their evolutionary conservation and based only on sequence information (e.g. RNA-seq data). On our computational sides, this typically requires dedicated methods that employ sophisticated, probabilistic methods for detecting RNA structure features, trans RNA-RNA interactions and other evolutionarily conserved signals and for testing detailed hypotheses about the underlying molecular mechanisms.

Ongoing projects and collaborations

Since starting here at the MDC in 2016, we have embarked on the following projects and started several exciting collaborations with experimental group on- and off-campus:

  • Together with the experimental fruit-fly group of Robert Zinzen here at the BIMSB-MDC, we have started investigating the role of specific trans RNA-RNA interactions during early fly neurogenesis. This extends our earlier Bioinformatics research where we show that A-to-I editing in the fruit fly induced changes of local RNA structure features around splice sites that yield splice variants that are specific to cells of the central nervous systems.
  • Together with the experimental group of Zsuzsanna Izsvak here at the MDC, we have started investigating novel biological classes of functionally important trans RNA-RNA interactions in human embryonic stem cells. This project is particularly exciting as we will look at transcriptome-wide data at sub-cellular resolution.
  • Detecting truly novel classes of trans RNA-RNA interactions based on sequence information alone remains computationally and conceptually extremely challenging, see the following recent papers by us and others for more information. We are developing new probabilistic methods for predicting trans RNA-RNA interactions de novo that overcome these challenges and that can also be applied on a transcriptome-wide scale and in a eukaryotic setting.
  • A number of exciting, experimental methods for probing the RNA structurome and RNA-RNA interactome in vivo and on a transcriptome-wide scale have recently emerged. As our recent book chapter explains, these require sophisticated computational pipelines to assemble, map and interpret the raw experimental data. We continue to work on new computational methods that aim to combine the best in terms of experimental in-vivo probing with the state-of-art in computational methods in order to get a transcriptome-wide view of RNA structures and trans RNA-RNA interactions in vivo.
  • By now, there is significant evidence that RNA transcripts can express different functional RNA structures in vivo, depending on the specific details of their cellular environment. We introduced this concept as alternative RNA structure expression. Right now, however, both experimental and computational methods for investigating RNA structure features in a high-throughput manner in vivo have conceptual challenges capturing this RNA structure heterogeneity, see our recent review for details. This is a challenge that we hope to address with new computational methods (as well as fresh ideas for novel experimental protocols).
  • Viruses are wizards at readily combining many, sometimes overlapping signals into their compact genomes. We continue our interest in understanding virus regulation. Together with the proteomics group of Matthias Selbach at the MDC and the virology group of Thomas Wolff a the Robert-Koch-Institute here in Berlin, we have started to analyse mechanisms of transcriptome regulation that play a decisive role in determining the species-specificity of different influenza strains. Check out our pre-print and the prominent role that M segment splicing and conserved RNA structure elements play in species-specific infections.

Pre-prints and publications

In the listing below, members of my group are indicated in bold. Joint first authors are indicated by an asterisk *.

Pre-prints:

This research is joint work with Robert Zinzen's fruit fly group here at the BIMSB-MDC. Peter Menzel is a post-doc in my group.

We here investigate the species-specifity of different influenza A strains. This is joint work the proteomics group of Matthias Selbach and his PhD student Boris Bogdanow at the MDC and the influenza A lab of Thorsten Wolff of the Robert Koch Institute in Berlin. We show that species-specificity is regulated via M segment splicing. My group's Bioinformatics analysis reveals that human-derived and avian-derived influenza A genomes have evolved quite different RNA structure elements overlapping the decisive 3' splice site which turns out to be a key determinant of splicing.

  • McCorkindale AL, Wahle P, Werner S, Jungreis I, Menzel P, Shukla CJ, Pereira Abreu RL, Irizarry R, Meyer IM, Kellis M, Rinn JR, Zinzen RP. A gene expression atlas of embryonic neurogenesis in Drosophila reveals complex spatiotemporal regulation of lncRNAs. doi: https://doi.org/10.1101/483461 bioRxiv
  • Bogdanow B, Eichelbaum K, Sadewasser A, Wang X, Husic I, Paki K, Hergeselle M, Vetter B, Hou J, Chen W, Wiebusch L, Meyer IM, Wolff T, Selbach M. The dynamic proteome of influenza A virus infection identifies M segment splicing as a host range determinant. doi: https://doi.org/10.1101/438176 bioRxiv

Publications:

We here provide a comprehensive in silico analysis of tissue-specific transcriptomes comprising dedicated small and total RNA-seq libraries at two distinct developmental stages during early fly neurogenesis. This enables us to investigate the potential functional roles of individual microRNAs with high spatio-temporal resolution in a genome-wide manner. Our study identifies 74 microRNAs that are significantly differentially expressed between the three cell types and the two developmental stages, predicts target genes of down-regulated microRNAs that show a significant enrichment of their target genes related to neurogenesis and also reveals how microRNAs regulate early fly neurogenesis by targeting transcription factors. Peter Menzel is a post-doc, Stefan Stefanov is a PhD student in my group. This is joint work with the Robert Zinzen's fruit fly group here at the BIMSB-MDC.

The recent years have seen a range of promising, new experimental protocols for investigating RNA structures and trans RNA-RNA interactions of entire transcriptomes in vivo. All of these experimenta strategies, however, require comprehensive computational pipelines for processing and interpreting the large-scale raw data and converting it into evidence for actual RNA structures or trans RNA-RNA interactions. In this invited and peer-reviewed book chapter, my PhD student Stefan Stefanov and I introduce and compare the different strategies and propose ideas for potential future improvements.

Recent advances on the experimental and computational side have identified a range of intriguing biologically relevant RNA molecules (i.e. transcripts) that exhibit more than a single functional RNA structure throughout their cellular life. This invited and peer-reviewed review paper summarizes computational strategies for successfully identifying these RNA structures and proposes the notion of alternative RNA structure expression to denote that a single transcript can encode several RNA structures which are functionally expressed in distinct, different in vivo settings.

We analyse tissue-specific high-throughput libraries of D. melanogaster to identify sites RNA editing. For this, we introduce a probabilistic analysis pipeline that utilises large input data and explicitly captures ADAR's requirement for double-stranded regions. Our analysis doubles the number of known RNA editing sites in the fruit fly genome. Our editing sites are 3 times more likely to occur in exons with multiple splicing acceptor/donor sites than in exons with unique splice sites (p-value < 2.10(-15)). Furthermore, we identify 244 edited regions where RNA editing and alternative splicing are likely to influence each other. For 96 out of these 244 regions, we find evolutionary evidence for conserved RNA secondary-structures near splice sites suggesting a potential regulatory mechanism where RNA editing may alter splicing patterns via changes in local RNA structure. Our research identifies a new functional role of RNA editing as mechanism for regulating RNA-structure mediated alternative splicing. This is likely to be of key functional importance in other biological settings such as the human brain. Alborz Mazloomian was a PhD student of mine.

There already exists an abundance of methods that aim to predict specific biological classes of trans RNA-RNA interactions, e.g. miRNA-mRNA interactions. It order to identify truly novel classes of biologically relevant interactions, we require general RNA-RNA interaction prediction methods. This is an invited and peer-reviewed review paper by my former PhD student Daniel Lai and me.

In this comprehensive manuscript, we contribute the first four RNA families comprising more than a single, functional RNA structure to the well-known RNA family data base Rfam. Alice Zhu was an MSc student in my group.

We here introduce our web-server e-RNA which offers a free and open-access collection of five published RNA sequence analysis tools, each solving specific problems not readily addressed by other available tools. Daniel Lai was a PhD student in my group.

This invited and peer-reviewed review paper summarises the diverse range of experimental and theoretical evidence for co-transcriptional RNA structure formation and proposes a range of ideas how existing, deterministic methods for RNA secondary structure prediction could be potentially improved by taking different aspects of co-transcriptional folding into account. The CoFold paper by the same authors shows that some aspects of co-transcriptional folding can be captured in dedicated models and that this significantly improves the prediction accuracy. Daniel Lai was a PhD student and Jeff Proctor an MSc student in my group.

 

We show that orthologous RNA genes from evolutionarily related organisms not only fold into the same final RNA structure, but that their co-transcriptional folding pathways also share conserved transient RNA structural features. Our study comprises the first comparative analysis of folding pathway prediction programs. Our conclusions are based on 6 data sets with known final and transient RNA structural features, the largest data set of this kind to date. Alice Zhu and Jeff Proctor were MSc students and Adi Steif was an undergraduate research student in my group.

 

We propose and implement a deterministic RNA secondary structure prediction method, called CoFold, that combines thermodynamic and kinetic considerations. For this, we modify the existing minimum free energy (MFE) method RNAfold and combine it with a model that judges the reachability of potential base-pairing partners during co-transcriptional folding. Our method is the first method of this kind. CoFold effectively depends only on a single free parameter that can be robustly trained. CoFold significantly improves the prediction accuracy of RNAfold, in particular for long sequences over 1000 nt. The method has the same memory and time complexity as RNAfold. Jeff Proctor was an MSc student in my group.

 

The hok/sok toxin-antitoxin system of Escherichia coli plasmid R1 increases plasmid maintenance by killing plasmid-free daughter cells. The hok/sok locus specifies two RNAs: hok messenger RNA, which encodes a toxic transmembrane protein, and sok antisense RNA, which binds a complementary region in the hok mRNA and induces transcript degradation. This post-segregational killing mechanism relies upon the ability of the hok messenger RNA to adopt alternative structural configurations which affect ease of translation and the susceptibility of the molecule to degradation. We have identified several hok mRNA paralogs in the genome of E. coli and Hok protein orthologs in the genomes of Enterobacteria. Using a combination of automated search and extensive manual editing, we have complied the first high-quality multiple sequence alignment for the hok messenger RNA that covers all three experimentally validated hok mRNA structures. Adi Steif was an undergraduate research student in my group.

 

We investigate a special tumour type of breast cancers, called triple-negative breast cancers (TNBCs), that is defined by a lack of oestrogen and progesterone receptors and ERBB2 gene amplification. It represents approximately 16% of all breast cancer cases. For this, we investigate 104 individual cases of TNBC. We find that these represent a wide spectrum of genomic evolution ranging from a few coding somatic aberrations in a few pathways to hundreds of coding somatic mutations and propose ways of clustering these into individual tumour clonal genotypes. My former PhD student Rodrigo Goya contributed a major part of the transcriptome analysis (RNA-seq data). This paper is the result of collaboration lead by Sam Aparicio, UBC/BC Cancer Agency, Vancouver, Canada.

 

This invited and peer-reviewed book chapter summarizes and compares different applications of high-throughput sequencing. Rodrigo Goya was a PhD in my group co-supervised by Marco Marra.

 

We propose and implement an RNA secondary structure visualisation program called R-chie which we make available via a web-server and a corresponding R-package called R4RNA. R-chie allows to visualise structural information (which may include pseudo-knotted RNA secondary structures as well as mutually exclusive base-pairs) alongside corresponding multiple sequence alignments. The users can also visualise quantitative information on structural and alignment features such as computationally derived base-pairing probabilities or experimentally derived accessibility values. Daniel Lai was a PhD student, Jeff Proctor and Alice Zhu were MSc students in my group.

 

We investigate follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) which constitute two of the most common non-Hodgkin lymphomas (NHLs). We investigate transcriptome and genome sequencing data from a group of 13 individual DLBCL cases and one FL case in order to identify genes with mutations in B-cell NHLs. Comparisons to data from normal cells allow us to identify 109 genes with multiple somatic mutations. These comprise several genes with roles in histone modification which are analysed in more detail. Our results suggest that disruption of chromatin biology plays a key role in lymphomagenesis. My PhD student Rodrigo Goya contributed to the analysis of transcriptome data which helped to identify key non-synonymous mutations under positive selective pressure. This research is part of a collaboration lead by Marco Marra of the Michael Smith Genome Sciences Centre, Vancouver, Canada.

 

We propose and implement two, computationally more efficient algorithms for Viterbi and stochastic EM training. The two new algorithms have the added advantage of being significantly easier to implement than existing algorithms. Both algorithms are also implemented in the HMMConverter software package by the same authors. Tin Yin Lam was an MSc student in my group.

 

We propose and implement a new method, called Transat, for detecting evolutionarily conserved helices. This includes helices that are transient, mutually exclusive and that would render the RNA structure pseudo-knotted. Transat is a probabilistic method that employs two probabilistic models of evolution (one to capture how base-pairs evolve over time, one to capture how un-paired nucleotides evolve). It takes as input a multiple sequence alignment and an evolutionary tree linking the sequences of the input alignment and produces as output a ranked list of helices that are assigned log-likelihood scores and p-values. We show in a comprehensive performance evaluation that our method is capable of detecting transient, mutually exclusive and pseudo-knotted features. Our method can thus be used to detect riboswitches which cannot be detected using existing methods for RNA secondary structure prediction as these predict exactly one global RNA secondary structure for a given input sequence/alignment. Nick Wiebe was an MSc student in my group.

 

We introduce a C++-based software package called HMMConverter, that allows users without programming expertise to set up complex hidden Markov models (HMMs) and pair-HMMs. The models and the algorithms that are to be used for generating predictions and for training the model's free parameters are specified via an XML-file. Compared to existing software packages, HMMConverter implements a number of unique features such as (1) the Hirschberg algorithm as memory-efficient alternative to the Viterbi algorithm for pair-HMMs, (2) taking into account prior information on the input sequences and (3) three new algorithms for parameter training (Viterbi, Baum-Welch and stochastic EM training) introduced by us. Tin Yin Lam was an MSc student in my group.

 

We identify an interlocked feedback loop in Arabidopsis thaliana where two RNA-binding proteins (AtGRP7 and AtGRP8) autoregulate and reciprocally crossregulate their alternative splicing by coupling unproductive splicing to NMD. My group predicts conserved RNA structural features in the intronic regions that are likely to be involved in altering the splicing pattern upon binding of the two RNA-binding proteins. This post-transcriptional feedback loop regulates circadian oscillations in Arabidopsis thaliana. This is joint work with Dorothee Staiger's experimental group at the University of Bielefeld, Germany.

 

This paper presents the results of the international Malaria consortium lead by Matt Berriman at the Wellcome Trust Sanger Institute, Cambridge, UK. My MSc student and I contributed the comparative annotation of the newly sequenced Malaria genome of Plasmodium knowlesi by mapping the known genes of the Malaria genome of Plasmodium falciparum. For this, we implemented a modified version of our comparative gene prediction program Projector that is capable of taking prior information on both genomic sequences into account and whose parameters were trained for these two Malaria genomes. I was a post-doc at the University of Oxford, UK, at that time and Karsten Borgwardt my MSc student.

 

This is an invited and peer-reviewed review paper comparing existing and proposing novel computational strategies for successfully predicting novel types of trans RNA-RNA interactions.

 

In Eutheria, X inactivation is initiated by the large noncoding RNA Xist. We contribute a computational, comparative study of evolutionarily conserved RNA structural features for the Xist gene in various eutheria. This analysis is challenging due to the fact that the Xist transcript is long (~17 kb), alternatively spliced and contains tandem repeats some of which are species-specific. We identify two regions that contain rodent-specific, conserved RNA structural features that may play a functional role in Xist regulation. This is joint work with Carolyn Brown's experimental group at UBC, Canada.

We introduce and implement a theoretical framework, called SimulFold, that is capable of co-estimating a conserved RNA secondary structure (that may contain pseudo-knots), a multiple sequence alignment and an evolutionary tree. Unlike many existing comparative methods for RNA secondary structure prediction, it does not require a fixed input alignment, thereby resolving a key chicken-and-egg problem. SimulFold employs a non-deterministic Bayesian Markov Chain Monte Carlo rather than an SCFG and is computationally very efficient (new RNA structures can be sampled in linear rather than cubic time). Due to the probabilistic nature of the framework and the co-estimation of key features, SimulFold allows un-precedented and detailed insights into sequence and structure conservation.

 

This is an invited and peer-reviewed review paper.

 

We use custom computational RNA structure prediction methods in combination with statistical analyses to detect several conserved RNA structural features in pre-mRNAs and mRNAs that are likely to be involved in regulating translation initiation (e.g. mouse caveolin 1 gene) and alternative splicing (e.g. human CFTR gene). This is one of the first studies (1) to identify conserved, local RNA structural elements overlapping splice sites, (2) to provide a statistical link between synonymous exon mutations and changes of the splicing efficiency and (3) to show that these changes are likely to be due to changes of the RNA structure that are induced by synonymous mutations.

 

We introduce a new mathematical algorithm for Baum-Welch parameter training that is computationally more efficient than existing ones and also significantly easier to implement. This result is relevant to all applications that employ hidden Markov models (HMMs) (and their variants) in conjunction with Baum-Welch parameter training.

 

We propose and implement the first algorithm for calculating arbitrary moments of the Boltzmann distribution of RNA secondary structures. Using our new algorithm, we find that biological RNAs have a Boltzmann distribution that comprises an ensemble of structures that are close to the minimum free energy structure. This feature is likely to convey an evolutionarily advantages to biological RNAs and is absent from randomly generated RNAs with the same overall sequence properties.

 

We introduce a comparative method, called RNA-Decoder, that is capable of detecting conserved RNA secondary structures in RNAs that may be partly protein-coding (e.g. viral RNA+ genomes, pre-mRNAs, mRNAs). The method employs new evolutionary models [5] to capture different, overlapping evolutionary constraints and, unlike Pfold, explicitly captures also local rather than only global RNA secondary structure. We show that RNA-Decoder outperforms existing methods for RNA secondary structure prediction that do not explicitly capture the protein-context (e.g. RNAalifold, Pfold, Mfold). RNA-Decoder was, for example, used for the genome-wide structural annotation of the HIV genome (paper by Kevin Week's group, Nature (2009)). RNA-Decoder is still unique in the sense that it explicitly captures the known protein-coding context of RNAs.

We show that structural RNA genes not only encode information on their known, final RNA secondary structure, but also information on their co-transcriptional folding pathway. More specifically, we show that transient RNA structures that are likely to prevent the co-transcriptional formation of the final RNA structure are suppressed, whereas transient RNA structures that could help the formation of the final RNA structure are encouraged. This paper was featured as a special highlight of BMC Molecular Biology.

We propose and implement several probabilistic models of evolutionary that model conserved RNA structural features overlapping known protein-coding regions. The key difficulties we address are (1) to capture the two different evolutionary constraints, (2) to propose a way of avoiding long-range correlations due to the coding-context and (3) to parametrise the evolutionary models in ways that capture the key sequence and structure signals while also allowing for robust parameter training given the scarcity of our training data.

 

We propose and implement a comparative gene prediction method, called Projector, that maps known genes of one genome to orthologous regions of a related genome. Similar to Doublescan, Projector employs a pair hidden Markov model (pair-HMM) and aligns and predicts pairs of genes simultaneously. We also incorporate a heuristic algorithm that allows us to generate predictions in near-linear time. We show that Projector outperforms protein-based methods such as Genewise, especially for pairs of genes that are more distantly related.

 

We propose a new comparative method for ab-initio gene prediction, called Doublescan, that aligns and predicts pairs of orthologous genes from mouse and human simultaneously, thereby avoiding the need for a fixed high-quality input alignment that other comparative methods require. The method is special in that it can also handle pairs of orthologous genes that are related by exon-fusion and exon-splitting. These account for 16% of orthologous mouse-human gene pairs.

We here introduce a new mathematical algorithm for jet detection in high-energy particle physics. This is research done by me while being an MSc student based at CERN in Geneva, Switzerland.

  • Menzel P, McCorkindale AL, Stefanov SR, Zinzen RP, Meyer IM. Transcriptional dynamics of microRNAs and their targets during Drosophila neurogenesis. RNA Biol. 2018 Dec 24 (published online). doi:10.1080/15476286.2018.1558907. PubMed PMID: 30582411. pubmed
  • Stefanov SR, Meyer IM Deciphering the Universe of RNA Structures and trans RNA–RNA Interactions of Transcriptomes In Vivo: From Experimental Protocols to Computational Analyses. In: Systems Biology. RNA Technologies, Edited by Rajewsky N, Jurga S, Barciszewski J, Springer, 2018. pdf file via open access by Springer
  • Meyer IM. In silico methods for co-transcriptional RNA secondary structure prediction and for investigating alternative RNA structure expression. Methods. 2017 May 1;120:3-16. PubMed PMID: 28433606. pubmed
  • Mazloomian A, Meyer IM. Genome-wide identification and characterization of tissue-specific RNA editing events in D. melanogaster and their potential role in regulating alternative splicing. RNA Biol. 2015;12(12):1391-401. doi: 10.1080/15476286.2015.1107703. PubMed PMID: 26512413. pubmed
  • Lai D, Meyer IM. A comprehensive comparison of general RNA-RNA interaction prediction methods. Nucleic Acids Res. 2016 Apr 20;44(7):e61. doi: 10.1093/nar/gkv1477. PubMed PMID: 26673718. pubmed
  • Zhu JY, Meyer IM. Four RNA families with functional transient structures. RNA Biol. 2015;12(1):5-20. doi: 10.1080/15476286.2015.1008373. PubMed PMID: 25751035. pubmed
  • Lai D, Meyer IM. e-RNA: a collection of web servers for comparative RNA structure prediction and visualisation. Nucleic Acids Res. 2014 Jul;42(Web Server issue):W373-6. doi: 10.1093/nar/gku292. PubMed PMID: 24810851. pubmed
  • Lai D*, Proctor JR*, Meyer IM. On the importance of cotranscriptional RNA structure formation. RNA. 2013 Nov;19(11):1461-73. doi: 10.1261/rna.037390.112. PubMed PMID: 24131802. pubmed
  • Zhu JY*, Steif A*, Proctor JR, Meyer IM. Transient RNA structure features are evolutionarily conserved and can be computationally predicted. Nucleic Acids Res. 2013 Jul;41(12):6273-85. doi: 10.1093/nar/gkt319. PubMed PMID: 23625966. pubmed
  • Proctor JR, Meyer IM. CoFold: an RNA secondary structure prediction method that takes co-transcriptional folding into account. Nucleic Acids Res. 2013 May;41(9):e102. doi: 10.1093/nar/gkt174. PubMed PMID: 23511969. pubmed
  • Steif A, Meyer IM. The hok mRNA family. RNA Biol. 2012 Dec;9(12):1399-404. doi: 10.4161/rna.22746. PubMed PMID: 23324554. pubmed
  • Shah SP, Roth A*, Goya R*, Oloumi A*, Ha G*, Zhao Y*, Turashvili G*, Ding J*, Tse K*, Haffari G*, Bashashati A*, Prentice LM, Khattra J, Burleigh A, Yap D, Bernard V, McPherson A, Shumansky K, Crisan A, Giuliany R, Heravi-Moussavi A, Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N, Zeng T, Ma K, Chan SK, Griffith M, Moradian A, Cheng SW, Morin GB, Watson P, Gelmon K, Chia S, Chin SF, Curtis C, Rueda OM, Pharoah PD, Damaraju S, Mackey J, Hoon K, Harkins T, Tadigotla V, Sigaroudinia M, Gascard P, Tlsty T, Costello JF, Meyer IM, Eaves CJ, Wasserman WW, Jones S, Huntsman D, Hirst M, Caldas C, Marra MA, Aparicio S. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012 Apr 4;486(7403):395-9. doi: 10.1038/nature10933. PubMed PMID: 22495314. pubmed
  • Goya R, Meyer IM, Marra MM. Applications of High-Throughput Sequencing. In: Bioinformatics for High Throughput Sequencing, Springer, 2012. books' web-page by Springer
  • Lai D, Proctor JR, Zhu JY, Meyer IM. R-Chie: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Res. 2012 Jul;40(12):e95. doi: 10.1093/nar/gks241. PubMed PMID: 22434875. pubmed
  • Morin RD*, Mendez-Lago M*, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL, Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM, Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ, Ben-Neriah S, Meissner B, Woolcock B, Boyle M, McDonald H, Tam A, Zhao Y, Delaney A, Zeng T, Tse K, Butterfield Y, Birol I, Holt R, Schein J, Horsman DE, Moore R, Jones SJ, Connors JM, Hirst M, Gascoyne RD, Marra MA. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature. 2011 Jul 27;476(7360):298-303. doi: 10.1038/nature10351. PubMed PMID: 21796119. pubmed
  • Lam TY, Meyer IM. Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training. Algorithms Mol Biol. 2010 Dec 9;5:38. doi: 10.1186/1748-7188-5-38. PubMed PMID: 21143925. pubmed
  • Wiebe NJ, Meyer IM. Transat - method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures. PLoS Comput Biol. 2010 Jun 24;6(6):e1000823. doi: 10.1371/journal.pcbi.1000823. PubMed PMID: 20589081. pubmed
  • Lam TY, Meyer IM. HMMConverter 1.0: a toolbox for hidden Markov models. Nucleic Acids Res. 2009 Nov;37(21):e139. doi: 10.1093/nar/gkp662. PubMed PMID: 19740770. pubmed
  • Schöning JC, Streitner C, Meyer IM, Gao Y, Staiger D. Reciprocal regulation of glycine-rich RNA-binding proteins via an interlocked feedback loop coupling alternative splicing to nonsense-mediated decay in Arabidopsis. Nucleic Acids Res. 2008 Dec;36(22):6977-87. doi: 10.1093/nar/gkn847. PubMed PMID: 18987006. pubmed
  • Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, Balasubrammaniam S, Borgwardt K, Brooks K, Carret C, Carver TJ, Cherevach I, Chillingworth T, Clark TG, Galinski MR, Hall N, Harper D, Harris D, Hauser H, Ivens A, Janssen CS, Keane T, Larke N, Lapp S, Marti M, Moule S, Meyer IM, Ormond D, Peters N, Sanders M, Sanders S, Sargeant TJ, Simmonds M, Smith F, Squares R, Thurston S, Tivey AR, Walker D, White B, Zuiderwijk E, Churcher C, Quail MA, Cowman AF, Turner CM, Rajandream MA, Kocken CH, Thomas AW, Newbold CI, Barrell BG, Berriman M. The genome of the simian and human malaria parasite Plasmodium knowlesi. Nature. 2008 Oct 9;455(7214):799-803. doi: 10.1038/nature07306. PubMed PMID: 18843368. pubmed
  • Meyer IM. Predicting novel RNA-RNA interactions. Curr Opin Struct Biol. 2008 Jun;18(3):387-93. doi: 10.1016/j.sbi.2008.03.006. PubMed PMID: 18485695. pubmed
  • Yen ZC, Meyer IM, Karalic S, Brown CJ. A cross-species comparison of X-chromosome inactivation in Eutheria. Genomics. 2007 Oct;90(4):453-63. PubMed PMID: 17728098. pubmed
  • Meyer IM, Miklós I. SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. PLoS Comput Biol. 2007 Aug;3(8):e149. PubMed PMID: 17696604. pubmed
  • Meyer IM. A practical guide to the art of RNA gene prediction. Brief Bioinform. 2007 Nov;8(6):396-414. PubMed PMID: 17483123. pubmed
  • Meyer IM, Miklós I. Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res. 2005 Nov 7;33(19):6338-48. PubMed PMID: 16275783. pubmed
  • Miklós I, Meyer IM. A linear memory algorithm for Baum-Welch training. BMC Bioinformatics. 2005 Sep 19;6:231. PubMed PMID: 16171529. pubmed
  • Miklós I, Meyer IM, Nagy B. Moments of the Boltzmann distribution for RNA secondary structures. Bull Math Biol. 2005 Sep;67(5):1031-47. PubMed PMID: 15998494. pubmed
  • Pedersen JS*, Meyer IM*, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004 Sep 24;32(16):4925-36. PubMed PMID: 15448187. pubmed
  • Meyer IM, Miklós I. Co-transcriptional folding is encoded within RNA genes. BMC Mol Biol. 2004 Aug 6;5:10. PubMed PMID: 15298702. pubmed
  • Pedersen JS*, Forsberg R*, Meyer IM, Hein J. An evolutionary model protein-coding regions with conserved RNA structure. Mol Biol Evol. 2004 Oct;21(10):1913-22. PubMed PMID: 15229291. pubmed
  • Meyer IM, Durbin R. Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 2004 Feb 4;32(2):776-83. PubMed PMID: 14764925. pubmed
  • Meyer IM, Durbin R. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics. 2002 Oct;18(10):1309-18. PubMed PMID: 12376375. pubmed
  • Bentvelsen S, Meyer I. The Cambridge jet algorithm: features and applications, European Physics Journal C4 (1998) 74.

Research

Bioinformatics of RNA structure and transcriptome regulation

Introduction and Lay Summary

When the human genome sequence was released more than a decade ago, it came as a surprise to many that the number of protein-coding genes was not radically different from the corresponding gene count of the seemingly more humble nematode Caenorhabditis elegans (C. elegans). The current gene counts (20313 for human (GRCh38.p5) versus 20447 for C. elegans (WBcel235)) are stunningly similar. The gene count itself is thus only a poor measure for the complexity of the corresponding organism.

Another surprise finding in the wake of the human genome sequencing project was the realisation that only a small fraction of the genome (<2%) actually encodes protein information. Moreover, many genes seem not encode any protein product at all (25180 so-called RNA genes (GRCh38.p5)). Moreover, even the primary transcripts of protein-coding genes contain a seemingly disproportionate fraction of non-coding nucleotides (introns and untranslated regions).

The primary products of all activated genes are transcripts (RNA sequences). The functional products of these transcripts are proteins as well as functional RNAs which constitute key, cellular players in any organism. How and when any of these products are generated is a fine-tuned process that e.g. depends on the tissue-type and developmental trajectory of each individual cell. As the functional products define the current state of each cell (whether this is a state of disease or health), it is of key importance to
understand how the different functional products of the transcriptome are made. Without this knowledge, we not only lack information on why certain products are made, but also have no means of correcting for erroneously produced products if the cell is an a state of disease. Somewhat suprisingly, however, the molecular mechanisms underlying transcriptome regulation remain largely underexplored.

We hypothesize that RNA structure features and trans RNA-RNA interactions between two different transcripts play decisive functional roles in regulating gene expression on transcriptome levels. To this end, we devise new computational methods that allow us to discover new mechanisms of transcriptome regulation based on sequence information alone (e.g. RNA-seq transcriptome data). Due to the size of today's transcriptome data sets, we can even detect subtle mechanisms of transcriptome regulation with significant statistical evidence that would be hard or impossible to detect using the best experimental methods, see our recent analysis of A-to-I RNA editing in the fruit-fly as one example.

Beyond the one-dimensional view of transcripts

More often than not, figures in textbooks or on educational web-pages illustrate the Central Dogma of Biology by depicting transcripts as linear or wavy sticks inside a eukaryotic cell, with transcription and splicing seemingly happening consecutively. What we know from many dedicated experiments, however, is that processes that alter the primary transcripts (e.g. splicing, RNA editing and RNA structure formation) happen co-transcriptionally, i.e. while the RNA sequence is being transcribed from the genome. Similarly to protein information, information on RNA structure or potential trans RNA-RNA interaction partners can be directly encoded in the transcript itself. This makes it evolutionarily robust as any regulatory signals are directly encoded in the sequence itself. We thus expect that RNA structural features and RNA-RNA interactions are widely used for regulating gene expression on transcript level.

Modelling RNA structures in vivo

In order to devise computational methods for detecting the RNA structural features that are functionally relevant in vivo, it is worth acknowledging the complexity of the cellular environment and the impact this may have on the structure formation process, see our review paper. By devising the new RNA secondary structure prediction program CoFold, we showed that it is possible to capture the overall effects of the speed and directionality of transcription in vivo and also confirmed an earlier, long-standing hypothesis by Morgan and Higgs from 1996. Our method yields significantly improved predictions, especially for long transcripts (> 200 nt) such as ribosomal RNAs. We know already from one of our earlier, in silico studies that the sequences of structured RNAs not only encode information on their final RNA structure, but also on how these RNAs fold in vivo during co-transcriptional folding.

Figure 1: Arc-plot for the HDV ribozyme made using R-Chie. Each arc represents one pair of base-paired alignment columns. Arcs and the alignment at the top show the alternative structure and the active structure; those at the bottom the inhibitory alternative structure. The left legend specifies the percentage of canonical base-pairs for each arc. The right legend colour-codes the nucleotides and specifies the evolutionary evidence supporting each arc.

It turns out that orthologous transcripts from related organisms also have similar co-transcriptional folding pathways and that distinct transient RNA structure features can be as conserved and functionally relevant as those of the final RNA structure, see [1], [2] and [3]. This has significant implications for many state-of-the-art methods in RNA secondary structure prediction as these typically assume that any given transcript folds into exactly one functional RNA structure. A probabilistic method called Transat developed earlier by us aims to address this problem and has allowed us to detect individual, conserved RNA secondary structure features of pseudo-knotted structures, ribo-switches and transient structures which are otherwise notoriously difficult to predict.

RNA structure features involved in splicing regulation

Figure 2:
(A) Genomic context of identified editing sites.
(B) Distribution of conversion types for four tissue types.
(C) Percentage of common editing sites between pairs of tissues.
(Bottom) Gene CG5850 is differentially expressed between head (blue) and digestive system (red) and editing and splicing may affect each other. X-axis: exons of the gene, y-axis: number of reads normalized by library size. Arrows show editing sites. The purple box is predicted to be alternatively expressed.

Viral genomes such as Hepatis-C and HIV-1 are known to encode functional RNA structure in protein-coding regions as one major constraint for their genomes it to remain short. We contributed early on to these studies by showing that these RNA structures can be reliably predicted provided the know protein context is explicitly taken into account, see [1], [2] by us and also [3]. Functional RNA structures overlapping protein-coding regions, however, are not the preserve of viral genomes, but can also regulate the alternative splicing and translation of eukaryotic protein-coding genes e.g. in Arabidopsis thaliana and mouse and human. In order to explore the link between RNA structure and alternative splicing on a transcriptome-wide scale, we recently analysed tissue-specific high-throughput transcriptome data from the fruit fly. Using a new, probabilistic analysis pipeline that explicitly captures the ADAR-requirement for double-stranded regions, we identified around 2000 novel editing sites as well as more than 200 regions where local RNA structure changes due to A-to-I RNA editing are likely to induce corresponding changes in the splicing pattern, see our paper for details.

Figure 3:
(Top) Arc-plot for the highlighted region of the Cip4 gene containing a predicted, conserved RNA secondary structure overlapping RNA editing sites (red arrows) that could influence alternative splicing via structural changes. The left legend colour-codes the nucleotides according to the evidence supporting each arc, see also Figure 1. Figure made using R-Chie. (Bottom) Gene structure of the Cip4 gene with grey box highlighting the structure-containing part at the top.

Trans RNA-RNA interactions regulating the transcriptome

RNAs not only have the potential to form RNA structure, but can also interact with other RNAs in trans. These trans-interactions involve the same simple structural building blocks as RNA structure features, i.e. hydrogen bonds and stacking interactions involving pairs of complementary nucleotides ({G,C}, {A,U} and {G, U}). In terms of evolution, it is much more straightforward to evolve a specific trans RNA-RNA interaction than to come up with a (properly folded) protein that would engange in a similarly specific protein-RNA interaction. We therefore hypothesize that many novel biological classes of trans RNA-RNA interactions (beyond the already well-known classes such as miRNA-mRNA and snoRNA-rRNA) remain to be discovered. We have shown in a range of settings how of the comparative, in silico approach can be harnessed to significantly improve upon existing state-of-the-art methods. We thus continue to develop new, computational methods that allow us to make discoveries that would otherwise be difficult to make. To this end, we also collaborate with dedicated experimental groups that allow us to generate large-scale transcriptome data set (which constitute the input to our methods) and that test our high-ranking predictions in dedicated follow-up experiments.

Jobs

We are seeking highly motivated and enthusiastic members to join our team.

 

Interested in joining the lab as a postdoc?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV with a list of publications
  • A short summary of your present and future research interests
  • Scan of your PhD certificate
  • Names of three referees

Potential candidates who are interested in joining us should be competitive to also apply for external fellowships and sources of funding such as:

Interested in joining the lab as a graduate student?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV
  • A short summary of your present and future research interests
  • Scan of your University certificates
  • Names of two or three referees

In addition, please also consider applying via the MDC graduate school.

Our past and present graduate students have come from diverse scientific backgrounds ranging from computer science, bioinformatics, physics, mathematics, statistics to bioengineering. We are interested in enthusiastic candidates with a strong interest in computational method development (C++, Java, R) that are also keen to study exciting biological systems using high-throughput transcriptome data.

Interested in joining the lab as an intern, undergraduate or Master student?

Please send an email to Irmtraud Meyer with the following documents:

  • Cover letter
  • CV
  • A short summary of your present and future research interests
  • Scan of your University certificates
  • Names of two or three referees

Our past and present students have come from diverse scientific backgrounds ranging from computer science, bioinformatics, physics, mathematics, statistics to bioengineering.