FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life

Abstract Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applie...

Full description

Saved in:
Bibliographic Details
Main Authors: Gemma I. Martínez-Redondo, Francisco M. Perez-Canales, Belén Carbonetto, José M. Fernández, Israel Barrios-Núñez, Marçal Vázquez-Valls, Ildefonso Cases, Ana M. Rojas, Rosa Fernández
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-025-08651-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849343570871320576
author Gemma I. Martínez-Redondo
Francisco M. Perez-Canales
Belén Carbonetto
José M. Fernández
Israel Barrios-Núñez
Marçal Vázquez-Valls
Ildefonso Cases
Ana M. Rojas
Rosa Fernández
author_facet Gemma I. Martínez-Redondo
Francisco M. Perez-Canales
Belén Carbonetto
José M. Fernández
Israel Barrios-Núñez
Marçal Vázquez-Valls
Ildefonso Cases
Ana M. Rojas
Rosa Fernández
author_sort Gemma I. Martínez-Redondo
collection DOAJ
description Abstract Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ~1000 animal proteomes, FANTASIA predicts functions to virtually all proteins, including up to 50% that remained unannotated by traditional homology-based methods. This enables the discovery of novel gene functions, enhancing our understanding of molecular evolution and organismal biology. FANTASIA holds particular promise for functional discovery in non-model taxa, offering advantages over homology-based tools in sensitivity and generalizability. FANTASIA is available on GitHub at https://github.com/CBBIO/FANTASIA .
format Article
id doaj-art-291f74d607be4a1a9eaedefec6d5b585
institution Kabale University
issn 2399-3642
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj-art-291f74d607be4a1a9eaedefec6d5b5852025-08-20T03:42:56ZengNature PortfolioCommunications Biology2399-36422025-08-01811810.1038/s42003-025-08651-2FANTASIA leverages language models to decode the functional dark proteome across the animal tree of lifeGemma I. Martínez-Redondo0Francisco M. Perez-Canales1Belén Carbonetto2José M. Fernández3Israel Barrios-Núñez4Marçal Vázquez-Valls5Ildefonso Cases6Ana M. Rojas7Rosa Fernández8Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF)Computational Biology and Bioinformatics, Andalusian Center for Developmental Biology (CABD-CSIC)Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF)Barcelona Supercomputing Center, Plaça d’Eusebi GüellComputational Biology and Bioinformatics, Andalusian Center for Developmental Biology (CABD-CSIC)Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF)Computational Biology and Bioinformatics, Andalusian Center for Developmental Biology (CABD-CSIC)Computational Biology and Bioinformatics, Andalusian Center for Developmental Biology (CABD-CSIC)Metazoa Phylogenomics and Genome Evolution Lab, Institute of Evolutionary Biology (CSIC-UPF)Abstract Protein functional annotation is crucial in biology, but many protein-coding genes remain uncharacterized, especially in non-model organisms. FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) integrates protein language models for large-scale functional annotation. Applied to ~1000 animal proteomes, FANTASIA predicts functions to virtually all proteins, including up to 50% that remained unannotated by traditional homology-based methods. This enables the discovery of novel gene functions, enhancing our understanding of molecular evolution and organismal biology. FANTASIA holds particular promise for functional discovery in non-model taxa, offering advantages over homology-based tools in sensitivity and generalizability. FANTASIA is available on GitHub at https://github.com/CBBIO/FANTASIA .https://doi.org/10.1038/s42003-025-08651-2
spellingShingle Gemma I. Martínez-Redondo
Francisco M. Perez-Canales
Belén Carbonetto
José M. Fernández
Israel Barrios-Núñez
Marçal Vázquez-Valls
Ildefonso Cases
Ana M. Rojas
Rosa Fernández
FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
Communications Biology
title FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
title_full FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
title_fullStr FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
title_full_unstemmed FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
title_short FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
title_sort fantasia leverages language models to decode the functional dark proteome across the animal tree of life
url https://doi.org/10.1038/s42003-025-08651-2
work_keys_str_mv AT gemmaimartinezredondo fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT franciscomperezcanales fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT belencarbonetto fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT josemfernandez fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT israelbarriosnunez fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT marcalvazquezvalls fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT ildefonsocases fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT anamrojas fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife
AT rosafernandez fantasialeverageslanguagemodelstodecodethefunctionaldarkproteomeacrosstheanimaltreeoflife