Heuristic discovery and design of promoters for the fine-control of metabolism in industrially relevant microbes

Gilman, James

Abstract

Predictable, robust genetic parts including constitutive promoters are one of the defining attributes of synthetic biology. Ideally, candidate promoters should cover a broad range of expression strengths and yield homogeneous output, whilst also being orthogonal to endogenous regulatory pathways. However, such libraries are not ...

Predictable, robust genetic parts including constitutive promoters are one of the defining attributes of synthetic biology. Ideally, candidate promoters should cover a broad range of expression strengths and yield homogeneous output, whilst also being orthogonal to endogenous regulatory pathways. However, such libraries are not always readily available in non-model organisms, such as the industrially relevant genus Geobacillus. A multitude of different approaches are available for the identification and de novo design of prokaryotic promoters, although it may be unclear which methodology is most practical in an industrial context. Endogenous promoters may be individually isolated from upstream of well-understood genes, or bioinformatically identified en masse. Alternatively, pre-existing promoters may be mutagenised, or mathematical abstraction can be used to model promoter strength and design de novo synthetic regulatory sequences. In this investigation, bioinformatic, mathematic and mutagenic approaches to promoter discovery were directly compared. Hundreds of previously uncharacterised putative promoters were bioinformatically identified from the core genome of four Geobacillus species, and a rational sampling method was used to select sequences for in vivo characterisation. A library of 95 promoters covered a 2-log range of expression strengths when characterised in vivo using fluorescent reporter proteins. Data derived from this experimental characterisation were used to train Artificial Neural Network, Partial Least Squares and Random Forest statistical models, which quantifiably inferred the relationship between DNA sequence and function. The resulting models showed limited predictive- but good descriptive-power. In particular, the models highlighted the importance of sequences upstream of the canonical -35 and -10 motifs for determining promoter function in Geobacillus. Additionally, two commonly used mutagenic techniques for promoter production, Saturation Mutagenesis of Flanking Regions and error-prone PCR, were applied. The resulting sequence libraries showed limited promoter activity, underlining the difficulty of deriving synthetic promoters in species where understanding of transcription regulation is limited. As such, bioinformatic identification and deep-characterisation of endogenous promoter elements was posited as the most practical approach for the derivation of promoter libraries in non-model organisms of industrial interest.

Heuristic discovery and design of promoters for the fine-control of metabolism in industrially relevant microbes

Doctoral Theses

Doctoral College