MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.

The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and...

Full description

Saved in:
Bibliographic Details
Main Authors: Ankit Gupta, Rohan Kapil, Darshan B Dhakan, Vineet K Sharma
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0093907
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849337646068793344
author Ankit Gupta
Rohan Kapil
Darshan B Dhakan
Vineet K Sharma
author_facet Ankit Gupta
Rohan Kapil
Darshan B Dhakan
Vineet K Sharma
author_sort Ankit Gupta
collection DOAJ
description The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.
format Article
id doaj-art-e3fd30a8dc6f48148458d4367c2a812a
institution Kabale University
issn 1932-6203
language English
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-e3fd30a8dc6f48148458d4367c2a812a2025-08-20T03:44:38ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0194e9390710.1371/journal.pone.0093907MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.Ankit GuptaRohan KapilDarshan B DhakanVineet K SharmaThe identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.https://doi.org/10.1371/journal.pone.0093907
spellingShingle Ankit Gupta
Rohan Kapil
Darshan B Dhakan
Vineet K Sharma
MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
PLoS ONE
title MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
title_full MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
title_fullStr MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
title_full_unstemmed MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
title_short MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.
title_sort mp3 a software tool for the prediction of pathogenic proteins in genomic and metagenomic data
url https://doi.org/10.1371/journal.pone.0093907
work_keys_str_mv AT ankitgupta mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT rohankapil mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT darshanbdhakan mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT vineetksharma mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata