Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books

Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additiona...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohammad A. Ateeq, Sabrina Tiun, Hamed Abdelhaq, Wandeep Kaur
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Narrative comprehension question answering retriever-reader model unsupervised retriever long document comprehension
Online Access:	https://ieeexplore.ieee.org/document/11029186/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849421786313129984
author	Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur
author_facet	Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur
author_sort	Mohammad A. Ateeq
collection	DOAJ
description	Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.
format	Article
id	doaj-art-d950af7092da4de096ecd9a031aa1cd9
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-d950af7092da4de096ecd9a031aa1cd92025-08-20T03:31:23ZengIEEEIEEE Access2169-35362025-01-011310106610108810.1109/ACCESS.2025.357849711029186Unsupervised Context-Linking Retriever for Question Answering on Long Narrative BooksMohammad A. Ateeq0https://orcid.org/0000-0003-0296-5562Sabrina Tiun1https://orcid.org/0000-0002-1134-973XHamed Abdelhaq2https://orcid.org/0000-0003-4803-6689Wandeep Kaur3https://orcid.org/0000-0003-2025-3710Faculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaDepartment of Computer Science, An-Najah National University, Nablus, PalestineFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaNarrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.https://ieeexplore.ieee.org/document/11029186/Narrative comprehensionquestion answeringretriever-reader modelunsupervised retrieverlong document comprehension
spellingShingle	Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books IEEE Access Narrative comprehension question answering retriever-reader model unsupervised retriever long document comprehension
title	Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_full	Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_fullStr	Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_full_unstemmed	Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_short	Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_sort	unsupervised context linking retriever for question answering on long narrative books
topic	Narrative comprehension question answering retriever-reader model unsupervised retriever long document comprehension
url	https://ieeexplore.ieee.org/document/11029186/
work_keys_str_mv	AT mohammadaateeq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT sabrinatiun unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT hamedabdelhaq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT wandeepkaur unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks

Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books

Similar Items