NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria

ABSTRACT Allicin tolerance (alt) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being hor...

Full description

Saved in:
Bibliographic Details
Main Authors: Brendon K. Myers, Anuj Lamichhane, Brian H. Kvitko, Bhabesh Dutta
Format: Article
Language:English
Published: American Society for Microbiology 2025-07-01
Series:mSphere
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/msphere.00023-25
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728977753604096
author Brendon K. Myers
Anuj Lamichhane
Brian H. Kvitko
Bhabesh Dutta
author_facet Brendon K. Myers
Anuj Lamichhane
Brian H. Kvitko
Bhabesh Dutta
author_sort Brendon K. Myers
collection DOAJ
description ABSTRACT Allicin tolerance (alt) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being horizontally transferred. This results in significant sequential diversity that further complicates their identification. Natural language processing (NLP), like techniques such as those used in DeepBGC, offers a promising solution by treating gene clusters like a language, allowing for identifying and collecting gene clusters based on patterns and relationships within the sequences. We curated and validated alt-like clusters in Pantoea ananatis 97-1R, Burkholderia gladioli pv. gladioli FDAARGOS 389, and Pseudomonas syringae pv. tomato DC3000. Leveraging sequences from the RefSeq bacterial database, we conducted comparative analyses of gene synteny, gene/protein sequences, protein structures, and predicted protein interactions. This approach enabled the discovery of several novel alt-like clusters previously undetectable by other methods, which were further validated experimentally. Our work highlights the effectiveness of NLP-like techniques for identifying underrepresented gene clusters and expands our understanding of the diversity and utility of alt-like clusters in diverse bacterial genera. This work demonstrates the potential of these techniques to simplify the identification process and enhance the applicability of biological data in real-world scenarios.IMPORTANCEThiosulfinates, like allicin, are potent antifeedants and antimicrobials produced by Allium species and pose a challenge for phytopathogenic bacteria. Phytopathogenic bacteria have been shown to utilize an allicin tolerance (alt) gene cluster to circumvent this host response, leading to economically significant yield losses. Due to the complexity of mining these clusters, we applied techniques akin to natural language processing to analyze Pfam domains and gene proximity. This approach led to the identification of novel alt-like gene clusters, showcasing the potential of artificial intelligence to reveal elusive and underrepresented genetic clusters and enhance our understanding of their diversity and role across various bacterial genera.
format Article
id doaj-art-6edeba083d2b403ba8254db298489be6
institution DOAJ
issn 2379-5042
language English
publishDate 2025-07-01
publisher American Society for Microbiology
record_format Article
series mSphere
spelling doaj-art-6edeba083d2b403ba8254db298489be62025-08-20T03:09:21ZengAmerican Society for MicrobiologymSphere2379-50422025-07-0110710.1128/msphere.00023-25NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteriaBrendon K. Myers0Anuj Lamichhane1Brian H. Kvitko2Bhabesh Dutta3Department of Plant Pathology, The University of Georgia, Tifton, Georgia, USADepartment of Plant Pathology, The University of Georgia, Tifton, Georgia, USADepartment of Plant Pathology, The University of Georgia, Athens, Georgia, USADepartment of Plant Pathology, The University of Georgia, Tifton, Georgia, USAABSTRACT Allicin tolerance (alt) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being horizontally transferred. This results in significant sequential diversity that further complicates their identification. Natural language processing (NLP), like techniques such as those used in DeepBGC, offers a promising solution by treating gene clusters like a language, allowing for identifying and collecting gene clusters based on patterns and relationships within the sequences. We curated and validated alt-like clusters in Pantoea ananatis 97-1R, Burkholderia gladioli pv. gladioli FDAARGOS 389, and Pseudomonas syringae pv. tomato DC3000. Leveraging sequences from the RefSeq bacterial database, we conducted comparative analyses of gene synteny, gene/protein sequences, protein structures, and predicted protein interactions. This approach enabled the discovery of several novel alt-like clusters previously undetectable by other methods, which were further validated experimentally. Our work highlights the effectiveness of NLP-like techniques for identifying underrepresented gene clusters and expands our understanding of the diversity and utility of alt-like clusters in diverse bacterial genera. This work demonstrates the potential of these techniques to simplify the identification process and enhance the applicability of biological data in real-world scenarios.IMPORTANCEThiosulfinates, like allicin, are potent antifeedants and antimicrobials produced by Allium species and pose a challenge for phytopathogenic bacteria. Phytopathogenic bacteria have been shown to utilize an allicin tolerance (alt) gene cluster to circumvent this host response, leading to economically significant yield losses. Due to the complexity of mining these clusters, we applied techniques akin to natural language processing to analyze Pfam domains and gene proximity. This approach led to the identification of novel alt-like gene clusters, showcasing the potential of artificial intelligence to reveal elusive and underrepresented genetic clusters and enhance our understanding of their diversity and role across various bacterial genera.https://journals.asm.org/doi/10.1128/msphere.00023-25NLP (natural language processing)bacteriologyplant pathologythiosulfinateAI (artificial intelligence)
spellingShingle Brendon K. Myers
Anuj Lamichhane
Brian H. Kvitko
Bhabesh Dutta
NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
mSphere
NLP (natural language processing)
bacteriology
plant pathology
thiosulfinate
AI (artificial intelligence)
title NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
title_full NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
title_fullStr NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
title_full_unstemmed NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
title_short NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
title_sort nlp like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria
topic NLP (natural language processing)
bacteriology
plant pathology
thiosulfinate
AI (artificial intelligence)
url https://journals.asm.org/doi/10.1128/msphere.00023-25
work_keys_str_mv AT brendonkmyers nlplikedeeplearningaidedinidentificationandvalidationofthiosulfinatetoleranceclustersindiversebacteria
AT anujlamichhane nlplikedeeplearningaidedinidentificationandvalidationofthiosulfinatetoleranceclustersindiversebacteria
AT brianhkvitko nlplikedeeplearningaidedinidentificationandvalidationofthiosulfinatetoleranceclustersindiversebacteria
AT bhabeshdutta nlplikedeeplearningaidedinidentificationandvalidationofthiosulfinatetoleranceclustersindiversebacteria