Building an end-to-end battery recipe knowledge base via transformer-based text mining

Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organi...

Full description

Saved in:
Bibliographic Details
Main Authors: Daeun Lee, Hiroshi Mizuseki, Jaewoong Choi, Byungju Lee
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Communications Materials
Online Access:https://doi.org/10.1038/s43246-025-00825-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organized this information. Here we present a language modeling-based protocol for extracting complete battery recipes from scientific papers. Using machine learning-based filtering and topic modeling, we identified 2174 relevant papers and extracted over 5800 paragraphs describing synthesis and assembly procedures. Deep learning-based named entity recognition models were trained to extract 30 entities with F1-scores of 88.18% and 94.61%. We also evaluated large language models, including GPT-4, using few-shot learning and fine-tuning. These results enabled the structured construction of 165 end-to-end recipes and identification of trends such as precursor–method associations. The resulting knowledge base supports flexible recipe retrieval and provides a scalable framework for organizing protocols across large volumes of publications, thereby accelerating literature review and data-driven battery design.
ISSN:2662-4443