On the relation between input and output distributions of scRNA-seq experiments


  • D. Schwabe
  • M. Falcke


  • Bioinformatics


  • Bioinformatics 38 (5): 1336-1343


  • MOTIVATION: Single-cell RNA sequencing determines RNA copy numbers per cell for a given gene. However, technical noise poses the question how observed distributions (output) are connected to their cellular distributions (input). RESULTS: We model a single-cell RNA sequencing setup consisting of PCR amplification and sequencing, and derive probability distribution functions for the output distribution given an input distribution. We provide copy number distributions arising from single transcripts during PCR amplification with exact expressions for mean and variance. We prove that the coefficient of variation of the output of sequencing is always larger than that of the input distribution. Experimental data reveals the variance and mean of the input distribution to obey characteristic relations, which we specifically determine for a HeLa data set. We can calculate as many moments of the input distribution as are known of the output distribution (up to all). This, in principle, completely determines the input from the output distribution.