Academic Journal of Computing & Information Science, 2020, 3(3); doi: 10.25236/AJCIS.2020.030306.
Lili Jia*, Tingting Sun
Zhejiang University of Science and Technology, Hangzhou 310023, China
*Corresponding author e-mail: [email protected]
The structure of RNA is very important in biological processes. Over the recent years, lots of machine learning method have been emerged to predict the secondary structure of RNA. In this paper, we use Support Vector Machine to predict secondary structure of RNA sequence. Meanwhile, a sequence-based method is proposed by combining a new feature representation which is based on RNA long-range interaction. We first quote E-NSSEL labels to represent the secondary structure of RNA. Combining with the definition of a new feature vector based on long-range interaction, the secondary structure of test sequence is predicted by SVM model, and the corresponding E-NSSEL sequence is consequently obtained. This sequence can be restored to secondary structure finally. The results which are obtained from RNA training and testing datasets show that this long-range-sequence-based method is superior to those method without new feature. It has higher prediction accuracy as considering the new feature. Moreover, it can predict RNA sequences with long length, which is difficult to deal with traditional folding prediction. Furthermore, it suggests that our method may provide a reliable tool for RNA secondary structure prediction, including the prediction of RNA with pseudoknots.
Machine learning, SVM, RNA secondary structure, long-range interaction
Lili Jia, Tingting Sun. RNA secondary structure prediction based on long-range interaction and Support Vector Machine. Academic Journal of Computing & Information Science (2020), Vol. 3, Issue 3: 43-52. https://doi.org/10.25236/AJCIS.2020.030306.
 Zuker M, Mathews D H, Turner D H. Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide [M] // RNA Biochemistry and Biotechnology. Springer Netherlands, 1999: 11-43.
 Ding Y, Lawrence C E. A statistical sampling algorithm for RNA secondary structure prediction. [J]. Nucleic Acids Research, 2003, 31 (24): 7280.
 Jing-Yuan H E. The Model Research of Support Vector Machines in the RNA Secondary Structure Prediction [J]. Computer Science, 2008, 35(4):181-183.
 Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information [J]. Nucleic Acids Research, 1981, 9 (1): 133-148.
 Sakakibara Y, Brown M, Hughey R, et al. Stochastic context-free grammars for tRNA modeling. [J]. Nucleic acids research, 1994, 22 (23): 5112.
 Rivas E, Eddy S R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. [J]. Journal of Molecular Biology, 1999, 285 (5): 2053-2068.
 Zuker M. Calculating nucleic acid secondary structure [J]. Current Opinion in Structural Biology, 2000, 10 (3): 303-310.
 Horesh Y, Doniger T, Michaeli S, et al. RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules. [J]. Bmc Bioinformatics, 2007, 8 (1): 366.
 Andrews M W. Stochastic Context-Free Grammars [J]. 2004.
 Knudsen B, Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars [J]. Nucleic Acids Research, 2003, 31 (13): 3423-3428.
 Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. [J]. Bioinformatics, 1999, 15 (6): 446-454.
 Searls D B. Linguistic approaches to biological sequences [J]. Computer Applications in the Biosciences Cabios, 1997, 13 (4): 333.
 James B D, Olsen G J, Pace N R. Phylogenetic comparative analysis of RNA secondary structure [J]. Methods in Enzymology, 1989, 180 (1): 227.
 Winker S, Overbeek R, Woese C R, et al. Structure detection through automated covariance search [J]. Computer Applications in the Biosciences Cabios, 1990, 6 (4): 365-371.
 Eddy S R, Durbin R. RNA sequence analysis using covariance models [J]. Nucleic Acids Research, 1994, 22 (11): 2079-88.
 Dowell R D, Eddy S R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. [J]. Bmc Bioinformatics, 2004, 5 (1): 71.
 Cortes C, Vapnik V. Support-vector networks [C] // Machine Learning. 1995: 273-297.
 Chang C C, Lin C J. LIBSVM: A library for support vector machines [M]. ACM, 2011.
 Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction [J]. Bioinformatics, 2001, 17 (8): 721.
 Osuna E, Freund R, Girosi F. Training Support Vector Machines: an Application to Face Detection [C] // Computer Vision and Pattern Recognition, 1997. Proceedings. 1997 IEEE Computer Society Conference on. IEEE, 2002: 130-136.
 Zhifeng Li, Xiaoou Tang. Bayesian face recognition using support vector machine and face clustering [J]. 2004, 2: II-374-II-380 Vol.2.
 Rubio G, Pomares H, Rojas I, et al. A heuristic method for parameter selection in LS-SVM: Application to time series prediction [J]. International Journal of Forecasting, 2011, 27 (3): 725-739.
 Espinoza M, Suykens J A K, Moor B D. Short Term Chaotic Time Series Prediction using Symmetric LS-SVM Regression [J]. Proc. of the 2005 International Symposium on Nonlinear Theory and Applications (NOLTA) pages:606-609, 2005: 606-609.
 Lendasse A. Fast bootstrap applied to LS-SVM for long term prediction of time series [J]. 2004, 1: 705--710.
 Zhuang Y. Automatic Caption Location and Extraction in Digital Video Based on Support Vector Machine [J]. Journal of Computer Aided Design & Computer Graphics, 2002, 14 (8): 750-749.
 Liu J W, Guo Z J, Fei W U, et al. Automatic Caption Location and Extraction in Digital Video Frame Based on SVM and ICA [J]. Journal of Image & Graphics, 2003, 8 (11): 1334-1340.
 Zhao Y, Wang Z. Consensus RNA Secondary Structure Prediction Based on Support Vector Machine Classification [J]. Chinese Journal of Biotechnology, 2008, 24 (7): 1140-1148.
 Chen Z, Hong W, Wang C. RNA secondary stucture prediction with plane pseudoknots based on support vector machine [J]. Icic Express Letters, 2009, 3 (4): 1411-1416.
 He J, He Z, Zou D. The research of RNA secondary structure prediction based on extended NSSEL labels [C] // Intelligent Control and Automation, 2008. Wcica 2008. World Congress on. IEEE, 2008: 5396-5400.
 Oliveira J V D A, Costa F, Backofen R, et al. SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification [J]. Bmc Bioinformatics, series [J]. 2004, 1: 705--710.
 Guo Y, Yu L, Wen Z, et al. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. [J]. Nucleic Acids Research, 2008, 36 (9): 3025-30.
 You Z H, Yu J Z, Zhu L, et al. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions [J]. Neurocomputing, 2014, 145 (18): 37-43.
 Chien J, Larsen P. Predicting the Plant Root-Associated Ecological Niche of 21 Pseudomonas Species Using Machine Learning and Metabolic Modeling [J]. 2017.
 Pervouchine, D.D., Khrameeva, E.E., Pichugina, M.Y., Nikolaienko, O.V., Gelfand, M.S., Rubtsov, P.M.Mironov, A.A. Evidence for widespread association of mammalian splicing and conserved long-range RNA structures. RNA 2012, 18, 1–15.
 Bernat, V.; Disney, M.D. RNA Structures as mediators of neurological diseases and as drug targets. Neuron. 2015, 87, 28–46.
 Ukil A. Support Vector Machine [J]. Computer Science, 2002, 1 (4):1-28.
 Garg P, Sharma V, Chaudhari P, et al. SubCellProt: predicting protein subcellular localization using machine learning approaches [J]. Silico Biol, 2009, 9 (1-2): 35-44
 Saeys Y, Inza I, Larrañaga P. WLD: review of feature selection techniques in bioinformatics [J]. Bioinformatics, 2007, 23 (19): 2507-2517.
 Gardner P P, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches [J]. Bmc Bioinformatics, 2004, 5 (1): 1-18.