Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predic...

Full description

Saved in:

Bibliographic Details
Main Authors:	Philip H. Williams, Rod Eyles, Georg Weiller
Format:	Article
Language:	English
Published:	Wiley 2012-01-01
Series:	Journal of Nucleic Acids
Online Access:	http://dx.doi.org/10.1155/2012/652979
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832548543387140096
author	Philip H. Williams Rod Eyles Georg Weiller
author_facet	Philip H. Williams Rod Eyles Georg Weiller
author_sort	Philip H. Williams
collection	DOAJ
description	MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA∗ duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
format	Article
id	doaj-art-6ac5ceea3fba4c87b371be843c3242d9
institution	Kabale University
issn	2090-0201 2090-021X
language	English
publishDate	2012-01-01
publisher	Wiley
record_format	Article
series	Journal of Nucleic Acids
spelling	doaj-art-6ac5ceea3fba4c87b371be843c3242d92025-02-03T06:13:48ZengWileyJournal of Nucleic Acids2090-02012090-021X2012-01-01201210.1155/2012/652979652979Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision TreesPhilip H. Williams0Rod Eyles1Georg Weiller2Division of Plant Sciences, Research School of Biology, College of Medicine, Biology & Environment, The Australian National University, Canberra, ACT 0200, AustraliaDivision of Plant Sciences, Research School of Biology, College of Medicine, Biology & Environment, The Australian National University, Canberra, ACT 0200, AustraliaDivision of Plant Sciences, Research School of Biology, College of Medicine, Biology & Environment, The Australian National University, Canberra, ACT 0200, AustraliaMicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA∗ duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.http://dx.doi.org/10.1155/2012/652979
spellingShingle	Philip H. Williams Rod Eyles Georg Weiller Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees Journal of Nucleic Acids
title	Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_full	Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_fullStr	Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_full_unstemmed	Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_short	Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_sort	plant microrna prediction by supervised machine learning using c5 0 decision trees
url	http://dx.doi.org/10.1155/2012/652979
work_keys_str_mv	AT philiphwilliams plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees AT rodeyles plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees AT georgweiller plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees

Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees

Similar Items