Gene Surfing: An efficient and versatile tool for targeted enzyme mining in metagenomics

Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data g...

Full description

Saved in:
Bibliographic Details
Main Authors: Tong Xu, Danyang Huang, Tingting Huang, Yuxin Wang, Wanqiu Chen, Shijunyin Chen, Yurong Qian, Haitao Yue
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-12-01
Series:Synthetic and Systems Biotechnology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405805X25001036
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data generation challenges precise enzyme identification: (1) Restricted applicability across varied sample types. (2) Narrow functional scope in target enzyme discovery.To address this, we developed Gene Surfing, a bioinformatics workflow platform based on Snakemake. It integrates modules for data quality control (Fastp), genome assembly (MEGAHIT), assembly evaluation (QUAST and MetaQUAST), functional annotation (Prokka), and homologous sequence retrieval (MMseqs2). Gene Surfing offers scalability, reproducibility, and efficiency, addressing key challenges in enzyme identification. Validation results include: Cellulose-degrading enzymes (GH5 family): 1,311,316 potential lignocellulolytic enzyme sequences were identified, with 127 sequences functionally validated (84.25 % activity rate); Polyethylene-degrading enzymes: 705 candidate sequences were found, 38 of which were heterologously expressed, showing an 81.5 % activity rate (31/38); Endonucleases (HNH superfamily): 585 potential sequences were retrieved, with 4 out of 7 tested showing activity (57.1 % success rate).
ISSN:2405-805X