Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books

Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additiona...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad A. Ateeq, Sabrina Tiun, Hamed Abdelhaq, Wandeep Kaur
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11029186/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.
ISSN:2169-3536