FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life

Abstract Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applie...

Full description

Saved in:
Bibliographic Details
Main Authors: Gemma I. Martínez-Redondo, Francisco M. Perez-Canales, Belén Carbonetto, José M. Fernández, Israel Barrios-Núñez, Marçal Vázquez-Valls, Ildefonso Cases, Ana M. Rojas, Rosa Fernández
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-025-08651-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ~1000 animal proteomes, FANTASIA predicts functions to virtually all proteins, including up to 50% that remained unannotated by traditional homology-based methods. This enables the discovery of novel gene functions, enhancing our understanding of molecular evolution and organismal biology. FANTASIA holds particular promise for functional discovery in non-model taxa, offering advantages over homology-based tools in sensitivity and generalizability. FANTASIA is available on GitHub at https://github.com/CBBIO/FANTASIA .
ISSN:2399-3642