- T. Breitenbach
- M.J. Schmitt
- T. Dandekar
- Bioinformatics 38 (17): 4162-4171
MOTIVATION: A recent approach to perform genetic tracing of complex biological problems involves the generation of synthetic DNA probes that specifically mark cells with a phenotype of interest. These synthetic locus control regions (sLCRs), in turn, drive the expression of a reporter gene, such as fluorescent protein. To build functional and specific sLCRs, it is critical to accurately select multiple bona fide cis-regulatory elements from the target cell phenotype cistrome. This selection occurs by maximizing the number and diversity of transcription factors (TFs) within the sLCR, yet the size of the final sLCR should remain limited. RESULTS: In this work, we discuss how optimization, in particular integer programming, can be used to systematically address the construction of a specific sLCR and optimize pre-defined properties of the sLCR. Our presented instance of a linear optimization problem maximizes the activation potential of the sLCR such that its size is limited to a pre-defined length and a minimum number of all TFs deemed sufficiently characteristic for the phenotype of interest is covered. We generated an sLCR to trace the mesenchymal glioblastoma program in patients by solving our corresponding linear program with the software optimizer Gurobi. Considering the binding strength of transcription factor binding sites (TFBSs) with their TFs as a proxy for activation potential, the optimized sLCR scores similarly to an sLCR experimentally validated in vivo, and is smaller in size while having the same coverage of TFBSs. AVAILABILITY: We provide a Python implementation of the presented framework in the Supplementary material with which an optimal selection of cis-regulatory elements can be calculated once the target set of TFs and their binding strength with their TFBSs is known.