Building an end-to-end battery recipe knowledge base via transformer-based text mining

Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organi...

Full description

Saved in:
Bibliographic Details
Main Authors: Daeun Lee, Hiroshi Mizuseki, Jaewoong Choi, Byungju Lee
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Communications Materials
Online Access:https://doi.org/10.1038/s43246-025-00825-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850273491443515392
author Daeun Lee
Hiroshi Mizuseki
Jaewoong Choi
Byungju Lee
author_facet Daeun Lee
Hiroshi Mizuseki
Jaewoong Choi
Byungju Lee
author_sort Daeun Lee
collection DOAJ
description Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organized this information. Here we present a language modeling-based protocol for extracting complete battery recipes from scientific papers. Using machine learning-based filtering and topic modeling, we identified 2174 relevant papers and extracted over 5800 paragraphs describing synthesis and assembly procedures. Deep learning-based named entity recognition models were trained to extract 30 entities with F1-scores of 88.18% and 94.61%. We also evaluated large language models, including GPT-4, using few-shot learning and fine-tuning. These results enabled the structured construction of 165 end-to-end recipes and identification of trends such as precursor–method associations. The resulting knowledge base supports flexible recipe retrieval and provides a scalable framework for organizing protocols across large volumes of publications, thereby accelerating literature review and data-driven battery design.
format Article
id doaj-art-556f7525d7be4e179b428da689a8f091
institution OA Journals
issn 2662-4443
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Communications Materials
spelling doaj-art-556f7525d7be4e179b428da689a8f0912025-08-20T01:51:28ZengNature PortfolioCommunications Materials2662-44432025-05-016111310.1038/s43246-025-00825-zBuilding an end-to-end battery recipe knowledge base via transformer-based text miningDaeun Lee0Hiroshi Mizuseki1Jaewoong Choi2Byungju Lee3Computational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guAbstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organized this information. Here we present a language modeling-based protocol for extracting complete battery recipes from scientific papers. Using machine learning-based filtering and topic modeling, we identified 2174 relevant papers and extracted over 5800 paragraphs describing synthesis and assembly procedures. Deep learning-based named entity recognition models were trained to extract 30 entities with F1-scores of 88.18% and 94.61%. We also evaluated large language models, including GPT-4, using few-shot learning and fine-tuning. These results enabled the structured construction of 165 end-to-end recipes and identification of trends such as precursor–method associations. The resulting knowledge base supports flexible recipe retrieval and provides a scalable framework for organizing protocols across large volumes of publications, thereby accelerating literature review and data-driven battery design.https://doi.org/10.1038/s43246-025-00825-z
spellingShingle Daeun Lee
Hiroshi Mizuseki
Jaewoong Choi
Byungju Lee
Building an end-to-end battery recipe knowledge base via transformer-based text mining
Communications Materials
title Building an end-to-end battery recipe knowledge base via transformer-based text mining
title_full Building an end-to-end battery recipe knowledge base via transformer-based text mining
title_fullStr Building an end-to-end battery recipe knowledge base via transformer-based text mining
title_full_unstemmed Building an end-to-end battery recipe knowledge base via transformer-based text mining
title_short Building an end-to-end battery recipe knowledge base via transformer-based text mining
title_sort building an end to end battery recipe knowledge base via transformer based text mining
url https://doi.org/10.1038/s43246-025-00825-z
work_keys_str_mv AT daeunlee buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining
AT hiroshimizuseki buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining
AT jaewoongchoi buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining
AT byungjulee buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining