Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data.
Nucleic Acids Research
Oxford University Press (OUP): Policy C - Option B
This is the final version of the article. Available from Oxford University Press via the DOI in this record.
Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.
National Key Basic Research Program of China [2012CB316503]; National High-Tech Research and Development Program of China [2014AA021103]; National Natural Science Foundation of China ; Tsinghua University Initiative Scientific Research Program [2014z21045]; Hong Kong Research Grants Council Early Career Scheme [419612 to K.Y.]; National Science Foundation [1339282 to D.H.M.]; Computing Platform of the National Protein Facilities (Tsinghua University). Funding for open access charge: National Natural Science Foundation of China .
Research Support, Non-U.S. Gov't
Vol. 43, Iss. 15, pp. 7247 - 7259
Place of publication