Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books

Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additiona...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad A. Ateeq, Sabrina Tiun, Hamed Abdelhaq, Wandeep Kaur
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11029186/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849421786313129984
author Mohammad A. Ateeq
Sabrina Tiun
Hamed Abdelhaq
Wandeep Kaur
author_facet Mohammad A. Ateeq
Sabrina Tiun
Hamed Abdelhaq
Wandeep Kaur
author_sort Mohammad A. Ateeq
collection DOAJ
description Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.
format Article
id doaj-art-d950af7092da4de096ecd9a031aa1cd9
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d950af7092da4de096ecd9a031aa1cd92025-08-20T03:31:23ZengIEEEIEEE Access2169-35362025-01-011310106610108810.1109/ACCESS.2025.357849711029186Unsupervised Context-Linking Retriever for Question Answering on Long Narrative BooksMohammad A. Ateeq0https://orcid.org/0000-0003-0296-5562Sabrina Tiun1https://orcid.org/0000-0002-1134-973XHamed Abdelhaq2https://orcid.org/0000-0003-4803-6689Wandeep Kaur3https://orcid.org/0000-0003-2025-3710Faculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaDepartment of Computer Science, An-Najah National University, Nablus, PalestineFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaNarrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.https://ieeexplore.ieee.org/document/11029186/Narrative comprehensionquestion answeringretriever-reader modelunsupervised retrieverlong document comprehension
spellingShingle Mohammad A. Ateeq
Sabrina Tiun
Hamed Abdelhaq
Wandeep Kaur
Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
IEEE Access
Narrative comprehension
question answering
retriever-reader model
unsupervised retriever
long document comprehension
title Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_full Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_fullStr Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_full_unstemmed Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_short Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
title_sort unsupervised context linking retriever for question answering on long narrative books
topic Narrative comprehension
question answering
retriever-reader model
unsupervised retriever
long document comprehension
url https://ieeexplore.ieee.org/document/11029186/
work_keys_str_mv AT mohammadaateeq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks
AT sabrinatiun unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks
AT hamedabdelhaq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks
AT wandeepkaur unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks