Gene Surfing: An efficient and versatile tool for targeted enzyme mining in metagenomics
Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data g...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2025-12-01
|
| Series: | Synthetic and Systems Biotechnology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2405805X25001036 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data generation challenges precise enzyme identification: (1) Restricted applicability across varied sample types. (2) Narrow functional scope in target enzyme discovery.To address this, we developed Gene Surfing, a bioinformatics workflow platform based on Snakemake. It integrates modules for data quality control (Fastp), genome assembly (MEGAHIT), assembly evaluation (QUAST and MetaQUAST), functional annotation (Prokka), and homologous sequence retrieval (MMseqs2). Gene Surfing offers scalability, reproducibility, and efficiency, addressing key challenges in enzyme identification. Validation results include: Cellulose-degrading enzymes (GH5 family): 1,311,316 potential lignocellulolytic enzyme sequences were identified, with 127 sequences functionally validated (84.25 % activity rate); Polyethylene-degrading enzymes: 705 candidate sequences were found, 38 of which were heterologously expressed, showing an 81.5 % activity rate (31/38); Endonucleases (HNH superfamily): 585 potential sequences were retrieved, with 4 out of 7 tested showing activity (57.1 % success rate). |
|---|---|
| ISSN: | 2405-805X |