AI-based nanotoxicity data extraction and prediction of nanotoxicity

With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among toxicity assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and in vivo assays, which involve high costs and ethical c...

Full description

Saved in:

Bibliographic Details
Main Authors:	Eunyong Ha, Seung Min Ha, Zayakhuu Gerelkhuu, Hyun-Yi Kim, Tae Hyun Yoon
Format:	Article
Language:	English
Published:	Elsevier 2025-01-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	Nanotoxicity Large Language Models Data extraction Prompt engineering LangChain Automated machine learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2001037025001175
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849732094700290048
author	Eunyong Ha Seung Min Ha Zayakhuu Gerelkhuu Hyun-Yi Kim Tae Hyun Yoon
author_facet	Eunyong Ha Seung Min Ha Zayakhuu Gerelkhuu Hyun-Yi Kim Tae Hyun Yoon
author_sort	Eunyong Ha
collection	DOAJ
description	With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among toxicity assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and in vivo assays, which involve high costs and ethical concerns. As a result, the qualitative and quantitative importance of data is now widely recognized. However, collecting large, high-quality data is both time-consuming and labor-intensive. Artificial intelligence (AI)-based data extraction techniques hold significant potential for extracting and organizing information from unstructured text. However, the use of large language models (LLMs) and prompt engineering for nanotoxicity data extraction has not been widely studied. In this study, we developed an AI-based automated data extraction pipeline to facilitate efficient data collection. The automation process was implemented using Python-based LangChain. We used 216 nanotoxicity research articles as training data to refine prompts and evaluate LLM performance. Subsequently, the most suitable LLM with refined prompts was used to extract test data, from 605 research articles. As a result, data extraction performance on training data achieved F1D.E. (F1 score for Data Extraction) ranging from 84.6 % to 87.6 % across different LLMs. Furthermore, using the extracted dataset from test set, we constructed automated machine learning (AutoML) models that achieved F1N.P. (F1 score for Nanotoxicity Prediction) exceeding 86.1 % in predicting nanotoxicity. Additionally, we assessed the reliability and applicability of models by comparing them in terms of ground truth, size, and balance. This study highlights the potential of AI-based data extraction, representing a significant contribution to nanotoxicity research.
format	Article
id	doaj-art-9cade041207a4cc786faa24f8c0bcf57
institution	DOAJ
issn	2001-0370
language	English
publishDate	2025-01-01
publisher	Elsevier
record_format	Article
series	Computational and Structural Biotechnology Journal
spelling	doaj-art-9cade041207a4cc786faa24f8c0bcf572025-08-20T03:08:20ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-012913814810.1016/j.csbj.2025.03.052AI-based nanotoxicity data extraction and prediction of nanotoxicityEunyong Ha0Seung Min Ha1Zayakhuu Gerelkhuu2Hyun-Yi Kim3Tae Hyun Yoon4Department of Chemistry, Hanyang University, Seoul 04763, Republic of KoreaDepartment of Chemistry, Hanyang University, Seoul 04763, Republic of KoreaResearch Institute for Convergence of Basic Science, Hanyang University, Seoul 04763, Republic of Korea; Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Republic of KoreaNGeneS Inc., Ansan-si 15495, Republic of KoreaDepartment of Chemistry, Hanyang University, Seoul 04763, Republic of Korea; Research Institute for Convergence of Basic Science, Hanyang University, Seoul 04763, Republic of Korea; Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Republic of Korea; Yoon Idea Lab. Co. Ltd., Seoul 04763, Republic of Korea; Corresponding author at: Department of Chemistry, Hanyang University, Seoul 04763, Republic of Korea.With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among toxicity assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and in vivo assays, which involve high costs and ethical concerns. As a result, the qualitative and quantitative importance of data is now widely recognized. However, collecting large, high-quality data is both time-consuming and labor-intensive. Artificial intelligence (AI)-based data extraction techniques hold significant potential for extracting and organizing information from unstructured text. However, the use of large language models (LLMs) and prompt engineering for nanotoxicity data extraction has not been widely studied. In this study, we developed an AI-based automated data extraction pipeline to facilitate efficient data collection. The automation process was implemented using Python-based LangChain. We used 216 nanotoxicity research articles as training data to refine prompts and evaluate LLM performance. Subsequently, the most suitable LLM with refined prompts was used to extract test data, from 605 research articles. As a result, data extraction performance on training data achieved F1D.E. (F1 score for Data Extraction) ranging from 84.6 % to 87.6 % across different LLMs. Furthermore, using the extracted dataset from test set, we constructed automated machine learning (AutoML) models that achieved F1N.P. (F1 score for Nanotoxicity Prediction) exceeding 86.1 % in predicting nanotoxicity. Additionally, we assessed the reliability and applicability of models by comparing them in terms of ground truth, size, and balance. This study highlights the potential of AI-based data extraction, representing a significant contribution to nanotoxicity research.http://www.sciencedirect.com/science/article/pii/S2001037025001175NanotoxicityLarge Language ModelsData extractionPrompt engineeringLangChainAutomated machine learning
spellingShingle	Eunyong Ha Seung Min Ha Zayakhuu Gerelkhuu Hyun-Yi Kim Tae Hyun Yoon AI-based nanotoxicity data extraction and prediction of nanotoxicity Computational and Structural Biotechnology Journal Nanotoxicity Large Language Models Data extraction Prompt engineering LangChain Automated machine learning
title	AI-based nanotoxicity data extraction and prediction of nanotoxicity
title_full	AI-based nanotoxicity data extraction and prediction of nanotoxicity
title_fullStr	AI-based nanotoxicity data extraction and prediction of nanotoxicity
title_full_unstemmed	AI-based nanotoxicity data extraction and prediction of nanotoxicity
title_short	AI-based nanotoxicity data extraction and prediction of nanotoxicity
title_sort	ai based nanotoxicity data extraction and prediction of nanotoxicity
topic	Nanotoxicity Large Language Models Data extraction Prompt engineering LangChain Automated machine learning
url	http://www.sciencedirect.com/science/article/pii/S2001037025001175
work_keys_str_mv	AT eunyongha aibasednanotoxicitydataextractionandpredictionofnanotoxicity AT seungminha aibasednanotoxicitydataextractionandpredictionofnanotoxicity AT zayakhuugerelkhuu aibasednanotoxicitydataextractionandpredictionofnanotoxicity AT hyunyikim aibasednanotoxicitydataextractionandpredictionofnanotoxicity AT taehyunyoon aibasednanotoxicitydataextractionandpredictionofnanotoxicity

AI-based nanotoxicity data extraction and prediction of nanotoxicity

Similar Items