RNA Structure Prediction

From DrugPedia: A Wikipedia for Drug discovery

Jump to: navigation, search

The functional form of single stranded RNA molecules frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements which are hydrogen bonds within the molecule. This leads to several recognizable "domains" of secondary structure like hairpin loops, bulges and internal loops. There has been a significant amount of bioinformatics research directed at the RNA structure prediction problem.

Contents

[edit] Single sequence structure prediction

A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given just the nucleic acid sequence. However, in the case of RNA much of the final structure is determined by the secondary structure or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base-pairings across diverse species.

One of the first attempts to predict RNA secondary structure was made by Ruth Nussinov and co-workers who used dynamic programming method for maximising the number of base-pairs (Nussinov 1978). However, there are several issues with this approach, most importantly the solution is not unique. Nussinov et al published an adaptation of their approach to use a simple nearest-neighbour energy model in 1980 (Nussinov 1980).

Michael Zuker and Patrick Stiegler in 1981 proposed using a slightly refined dynamic programming approach that models nearest neighbour energy interactions that directly incorporates stacking into the prediction (Zuker 1981). The energies that are minimized by the recursion are derived from empirical calorimetric experiments, the most up-to-date parameters were published in 2004 (Mathews 2004), although most software packages use the previous set assembled in 1999 (Mathews 1999). There has been recent progress in estimating "energy" parameters directly from known structures (Do 2006). Another approach researchers are using is to sample structures from the Boltzmann ensemble (McCaskill 1990, Ding 2003).

One of the issues when predicting RNA secondary structure is that the standard recursions (eg. Nussinov/Zuker-Stiegler) exclude pseudoknot. Elena Rivas and Sean Eddy published a dynamic programming algorithm that could handle pseudoknots (Rivas 1999). However, the time and memory requirements of the method are prohibitive. This has prompted several researchers to implement versions of the algorithm that restrict the classes of pseudoknots, resulting in gains in performance.

[edit] Comparative structure prediction

[edit] Align and fold

Evolution frequently preserves functional RNA structure better than RNA sequence. Hence, a common biological problem is to infer a common structure for two or more different RNA sequences. One of the first rigorous mathematical treatments of this problem was by David Sankoff (Sankoff 1985). However, this approach is notoriously computationally expensive. Some notable attempts at implementing restricted versions of Sankoff's algorithm are Foldalign (Havgaard 2005, Torarinsson 2007), Dynalign (Mathews 2002, Harmanci 2007), PMmulti/PMcomp (Hofacker 2004) and Stemloc (Holmes 2005).

[edit] Align then fold

A practical but heuristic approach is to use multiple sequence alignment tools to produce an alignment of several RNA sequences and then attempt to detect covarying (base paired) sites in the alignment. A common method is to use the mutual Information content (Chiu 1991, Gutell 1992), however, recent work has shown that this measure does not detect RNA structure covariation very well (Lindgreen 2006). Some alternative approaches using a sum of energetic and covariance terms (Hofacker 2002) or evolutionary SCFGs (Knudsen 2003) have been implemented.

[edit] Fold then align

A less widely used approach is to fold the sequences using single sequence structure prediction methods and align the resulting structures using tree-based metrics (Shapiro 1990). The fundamental weakness with this approach is that single sequence predictions are often inaccurate, thus all further analyses are affected.

[edit] See also

[edit] References

  • Chiu, D.K. and Kolodziejczak, T. (1991) Inferring consensus structure from nucleic acid sequences. Comput. Appl. Biosci, . 7, 347–352
  • Ding Y, Lawrence CE. (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24):7280-301.
  • Do CB, Woods DA, Batzoglou S. (2006) CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 22(14):e90-8.
  • Gutell, R.R., et al. (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res, . 20, 5785–5795
  • Harmanci AO, Sharma G, Mathews DH, (2007), Efficient Pairwise RNA Structure Prediction Using Probabilistic Alignment Constraints in Dynalign, BMC Bioinformatics, 8(130).
  • Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J. (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics. 21(9):1815-24.
  • Hofacker IL, Bernhart SH, Stadler PF. (2004) Alignment of RNA base pairing probability matrices. Bioinformatics. 20(14):2222-7.
  • Hofacker IL, Fekete M, Stadler PF. (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol. 319(5):1059-66.
  • Holmes I. (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics. 2005 Mar 24;6:73.
  • Knudsen B, Hein J. (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 31(13):3423-8.
  • Lindgreen S, Gardner PP, Krogh A. (2006) Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics. 22(24):2988-95.
  • Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. (2004) Incorporating chemical modification constraints into a dynamic programming algorothm for prediction of RNA secondary structure. Proceedings of the National Academy of Sciences USA. 101:7287-7292.
  • Mathews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 288(5):911-40.
  • Mathews DH, Turner DH. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol. 317(2):191-203.
  • McCaskill JS. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 29(6-7):1105-19.
  • Nussinov R, Piecznik G, Grigg JR and Kleitman DJ (1978) Algorithms for loop matchings. SIAM Journal on Applied Mathematics.
  • Nussinov R and Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci U S A. 77(11):6309-13.
  • Rivas E, Eddy SR. (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol. 285(5):2053-68.
  • Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM Journal on Applied Mathematics. 45:810-825.
  • Torarinsson E, Havgaard JH, Gorodkin J. (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics.
  • Zuker M, Stiegler P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. Jan 10;9(1):133-48.
  • Shapiro BA and Zhang K (1990) Comparing Multiple RNA Secondary Structures Using Tree Comparisons Computer Applications in the Biosciences, vol. 6, no. 4, pp. 309-318.

[edit] Original Source

This article was originally posted in Wikipedia.