Dynamic in-context learning with conversational models for data extraction and materials property prediction

The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that lev...

Full description

Saved in:
Bibliographic Details
Main Author: Chinedu E. Ekuma
Format: Article
Language:English
Published: AIP Publishing LLC 2025-03-01
Series:APL Machine Learning
Online Access:http://dx.doi.org/10.1063/5.0254406
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850259667441156096
author Chinedu E. Ekuma
author_facet Chinedu E. Ekuma
author_sort Chinedu E. Ekuma
collection DOAJ
description The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs such as Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies—enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95% with an error rate of ∼9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical parameter for device integration, and energy bandgap values are developed using PropertyExtractor. In particular, for the thickness database, the rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of various material property databases, advancing the field.
format Article
id doaj-art-13b70add299a4d66afaae18e0cfffe3c
institution OA Journals
issn 2770-9019
language English
publishDate 2025-03-01
publisher AIP Publishing LLC
record_format Article
series APL Machine Learning
spelling doaj-art-13b70add299a4d66afaae18e0cfffe3c2025-08-20T01:55:49ZengAIP Publishing LLCAPL Machine Learning2770-90192025-03-0131016119016119-910.1063/5.0254406Dynamic in-context learning with conversational models for data extraction and materials property predictionChinedu E. Ekuma0Department of Physics, Lehigh University, Bethlehem, Pennsylvania 18015, USAThe advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs such as Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies—enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95% with an error rate of ∼9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical parameter for device integration, and energy bandgap values are developed using PropertyExtractor. In particular, for the thickness database, the rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of various material property databases, advancing the field.http://dx.doi.org/10.1063/5.0254406
spellingShingle Chinedu E. Ekuma
Dynamic in-context learning with conversational models for data extraction and materials property prediction
APL Machine Learning
title Dynamic in-context learning with conversational models for data extraction and materials property prediction
title_full Dynamic in-context learning with conversational models for data extraction and materials property prediction
title_fullStr Dynamic in-context learning with conversational models for data extraction and materials property prediction
title_full_unstemmed Dynamic in-context learning with conversational models for data extraction and materials property prediction
title_short Dynamic in-context learning with conversational models for data extraction and materials property prediction
title_sort dynamic in context learning with conversational models for data extraction and materials property prediction
url http://dx.doi.org/10.1063/5.0254406
work_keys_str_mv AT chinedueekuma dynamicincontextlearningwithconversationalmodelsfordataextractionandmaterialspropertyprediction