ChunkUIE: Chunked instruction-based unified information extraction.

Large language models (LLMs) have demonstrated remarkable performance across various linguistic tasks. However, existing LLMs perform inadequately in information extraction tasks for both Chinese and English. Numerous studies attempt to enhance model performance by increasing the scale of training d...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wei Li, Yingzhen Liu, Yinling Yang, Ting Zhang, Wei Men
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0326764
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849319493397905408
author	Wei Li Yingzhen Liu Yinling Yang Ting Zhang Wei Men
author_facet	Wei Li Yingzhen Liu Yinling Yang Ting Zhang Wei Men
author_sort	Wei Li
collection	DOAJ
description	Large language models (LLMs) have demonstrated remarkable performance across various linguistic tasks. However, existing LLMs perform inadequately in information extraction tasks for both Chinese and English. Numerous studies attempt to enhance model performance by increasing the scale of training data. However, discrepancies in the number and type of schemas used during training and evaluation can harm model effectiveness. To tackle this challenge, we propose ChunkUIE, a unified information extraction model that supports Chinese and English. We design a chunked instruction construction strategy that randomly and reproducibly divides all schemas into chunks containing an identical number of schemas. This approach ensures that the union of schemas across all chunks encompasses all schemas. By limiting the number of schemas in each instruction, this strategy effectively addresses the performance degradation caused by inconsistencies in schema counts between training and evaluation. Additionally, we construct some challenging negative schemas using a predefined hard schema dictionary, which mitigates the model's semantic confusion regarding similar schemas. Experimental results demonstrate that ChunkUIE enhances zero-shot performance in information extraction.
format	Article
id	doaj-art-54be924bf664493389aafa53c01e970b
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-54be924bf664493389aafa53c01e970b2025-08-20T03:50:26ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01206e032676410.1371/journal.pone.0326764ChunkUIE: Chunked instruction-based unified information extraction.Wei LiYingzhen LiuYinling YangTing ZhangWei MenLarge language models (LLMs) have demonstrated remarkable performance across various linguistic tasks. However, existing LLMs perform inadequately in information extraction tasks for both Chinese and English. Numerous studies attempt to enhance model performance by increasing the scale of training data. However, discrepancies in the number and type of schemas used during training and evaluation can harm model effectiveness. To tackle this challenge, we propose ChunkUIE, a unified information extraction model that supports Chinese and English. We design a chunked instruction construction strategy that randomly and reproducibly divides all schemas into chunks containing an identical number of schemas. This approach ensures that the union of schemas across all chunks encompasses all schemas. By limiting the number of schemas in each instruction, this strategy effectively addresses the performance degradation caused by inconsistencies in schema counts between training and evaluation. Additionally, we construct some challenging negative schemas using a predefined hard schema dictionary, which mitigates the model's semantic confusion regarding similar schemas. Experimental results demonstrate that ChunkUIE enhances zero-shot performance in information extraction.https://doi.org/10.1371/journal.pone.0326764
spellingShingle	Wei Li Yingzhen Liu Yinling Yang Ting Zhang Wei Men ChunkUIE: Chunked instruction-based unified information extraction. PLoS ONE
title	ChunkUIE: Chunked instruction-based unified information extraction.
title_full	ChunkUIE: Chunked instruction-based unified information extraction.
title_fullStr	ChunkUIE: Chunked instruction-based unified information extraction.
title_full_unstemmed	ChunkUIE: Chunked instruction-based unified information extraction.
title_short	ChunkUIE: Chunked instruction-based unified information extraction.
title_sort	chunkuie chunked instruction based unified information extraction
url	https://doi.org/10.1371/journal.pone.0326764
work_keys_str_mv	AT weili chunkuiechunkedinstructionbasedunifiedinformationextraction AT yingzhenliu chunkuiechunkedinstructionbasedunifiedinformationextraction AT yinlingyang chunkuiechunkedinstructionbasedunifiedinformationextraction AT tingzhang chunkuiechunkedinstructionbasedunifiedinformationextraction AT weimen chunkuiechunkedinstructionbasedunifiedinformationextraction

ChunkUIE: Chunked instruction-based unified information extraction.

Similar Items