Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books
Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additiona...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11029186/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849421786313129984 |
|---|---|
| author | Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur |
| author_facet | Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur |
| author_sort | Mohammad A. Ateeq |
| collection | DOAJ |
| description | Narrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens. |
| format | Article |
| id | doaj-art-d950af7092da4de096ecd9a031aa1cd9 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-d950af7092da4de096ecd9a031aa1cd92025-08-20T03:31:23ZengIEEEIEEE Access2169-35362025-01-011310106610108810.1109/ACCESS.2025.357849711029186Unsupervised Context-Linking Retriever for Question Answering on Long Narrative BooksMohammad A. Ateeq0https://orcid.org/0000-0003-0296-5562Sabrina Tiun1https://orcid.org/0000-0002-1134-973XHamed Abdelhaq2https://orcid.org/0000-0003-4803-6689Wandeep Kaur3https://orcid.org/0000-0003-2025-3710Faculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaDepartment of Computer Science, An-Najah National University, Nablus, PalestineFaculty of Information Science and Technology, Centre for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, MalaysiaNarrative Question Answering (QA) involves understanding the context, events, and relationships within narrative texts for accurate question answering. However, narrative books impose new challenges while utilizing recent pretrained large language models since such lengthy content requires additional computational costs and leads to performance degradation. Moreover, identifying the most relevant passages for a given question is particularly challenging due to the lack of labeled question-passage pairs for training the retriever. This paper introduces the Unsupervised Context Linking Retriever (UCLR), a novel approach that efficiently retrieves relevant passages from long narrative texts without requiring labeled (question, passage) pairs. UCLR uses an encoder-decoder model to generate synthetic (question, answer) pairs, measuring the relevance of passages by comparing the error between the generated pair and the reference pair, which serves as a synthetic training signal. This method optimizes the retriever to identify passages with sufficient context to accurately reconstruct both the question and the answer, improving retrieval accuracy. UCLR also identifies key events surrounding each passage in the retrieved set and constructs a new set of passages from these key events, enabling coverage of both broader narrative structures and finer details. Experimental results on the NarrativeQA benchmark show that UCLR achieves relative improvements of +8% on the validation set and +5% on the test set, outperforming state-of-the-art unsupervised retrievers. Additionally, the results demonstrate that combining UCLR with a simple reader model outperforms other state-of-the-art readers designed for processing lengthy documents, achieving a relative performance gain of 7.8% on the test set while being 5 times faster as UCLR allows the reader model to focus on a pertinent subset of tokens.https://ieeexplore.ieee.org/document/11029186/Narrative comprehensionquestion answeringretriever-reader modelunsupervised retrieverlong document comprehension |
| spellingShingle | Mohammad A. Ateeq Sabrina Tiun Hamed Abdelhaq Wandeep Kaur Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books IEEE Access Narrative comprehension question answering retriever-reader model unsupervised retriever long document comprehension |
| title | Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books |
| title_full | Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books |
| title_fullStr | Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books |
| title_full_unstemmed | Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books |
| title_short | Unsupervised Context-Linking Retriever for Question Answering on Long Narrative Books |
| title_sort | unsupervised context linking retriever for question answering on long narrative books |
| topic | Narrative comprehension question answering retriever-reader model unsupervised retriever long document comprehension |
| url | https://ieeexplore.ieee.org/document/11029186/ |
| work_keys_str_mv | AT mohammadaateeq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT sabrinatiun unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT hamedabdelhaq unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks AT wandeepkaur unsupervisedcontextlinkingretrieverforquestionansweringonlongnarrativebooks |