Building an end-to-end battery recipe knowledge base via transformer-based text mining
Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organi...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Communications Materials |
| Online Access: | https://doi.org/10.1038/s43246-025-00825-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850273491443515392 |
|---|---|
| author | Daeun Lee Hiroshi Mizuseki Jaewoong Choi Byungju Lee |
| author_facet | Daeun Lee Hiroshi Mizuseki Jaewoong Choi Byungju Lee |
| author_sort | Daeun Lee |
| collection | DOAJ |
| description | Abstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organized this information. Here we present a language modeling-based protocol for extracting complete battery recipes from scientific papers. Using machine learning-based filtering and topic modeling, we identified 2174 relevant papers and extracted over 5800 paragraphs describing synthesis and assembly procedures. Deep learning-based named entity recognition models were trained to extract 30 entities with F1-scores of 88.18% and 94.61%. We also evaluated large language models, including GPT-4, using few-shot learning and fine-tuning. These results enabled the structured construction of 165 end-to-end recipes and identification of trends such as precursor–method associations. The resulting knowledge base supports flexible recipe retrieval and provides a scalable framework for organizing protocols across large volumes of publications, thereby accelerating literature review and data-driven battery design. |
| format | Article |
| id | doaj-art-556f7525d7be4e179b428da689a8f091 |
| institution | OA Journals |
| issn | 2662-4443 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Communications Materials |
| spelling | doaj-art-556f7525d7be4e179b428da689a8f0912025-08-20T01:51:28ZengNature PortfolioCommunications Materials2662-44432025-05-016111310.1038/s43246-025-00825-zBuilding an end-to-end battery recipe knowledge base via transformer-based text miningDaeun Lee0Hiroshi Mizuseki1Jaewoong Choi2Byungju Lee3Computational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guComputational Science Research Center, Korea Institute of Science and Technology, Seongbuk-guAbstract Recent studies have increasingly applied natural language processing to automatically extract experimental information from battery materials literature. Despite the complexity of battery manufacturing—from material synthesis to cell assembly—no comprehensive study has systematically organized this information. Here we present a language modeling-based protocol for extracting complete battery recipes from scientific papers. Using machine learning-based filtering and topic modeling, we identified 2174 relevant papers and extracted over 5800 paragraphs describing synthesis and assembly procedures. Deep learning-based named entity recognition models were trained to extract 30 entities with F1-scores of 88.18% and 94.61%. We also evaluated large language models, including GPT-4, using few-shot learning and fine-tuning. These results enabled the structured construction of 165 end-to-end recipes and identification of trends such as precursor–method associations. The resulting knowledge base supports flexible recipe retrieval and provides a scalable framework for organizing protocols across large volumes of publications, thereby accelerating literature review and data-driven battery design.https://doi.org/10.1038/s43246-025-00825-z |
| spellingShingle | Daeun Lee Hiroshi Mizuseki Jaewoong Choi Byungju Lee Building an end-to-end battery recipe knowledge base via transformer-based text mining Communications Materials |
| title | Building an end-to-end battery recipe knowledge base via transformer-based text mining |
| title_full | Building an end-to-end battery recipe knowledge base via transformer-based text mining |
| title_fullStr | Building an end-to-end battery recipe knowledge base via transformer-based text mining |
| title_full_unstemmed | Building an end-to-end battery recipe knowledge base via transformer-based text mining |
| title_short | Building an end-to-end battery recipe knowledge base via transformer-based text mining |
| title_sort | building an end to end battery recipe knowledge base via transformer based text mining |
| url | https://doi.org/10.1038/s43246-025-00825-z |
| work_keys_str_mv | AT daeunlee buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining AT hiroshimizuseki buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining AT jaewoongchoi buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining AT byungjulee buildinganendtoendbatteryrecipeknowledgebaseviatransformerbasedtextmining |