Zero-shot reranking with dense encoder models for news background linking

News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search...

Full description

Saved in:
Bibliographic Details
Main Authors: Marwa Essam, Tamer Elsayed
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2534.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587207117897728
author Marwa Essam
Tamer Elsayed
author_facet Marwa Essam
Tamer Elsayed
author_sort Marwa Essam
collection DOAJ
description News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article’s vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article.
format Article
id doaj-art-4334cd3d57fa432da177a76a989adf51
institution Kabale University
issn 2376-5992
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-4334cd3d57fa432da177a76a989adf512025-01-24T15:05:13ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e253410.7717/peerj-cs.2534Zero-shot reranking with dense encoder models for news background linkingMarwa Essam0Tamer Elsayed1College of Engineering, Qatar University, Doha, QatarCollege of Engineering, Qatar University, Doha, QatarNews background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article’s vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article.https://peerj.com/articles/cs-2534.pdfNews linkingNews recommendationText semanticsAd-hoc retrievalSemantic matching
spellingShingle Marwa Essam
Tamer Elsayed
Zero-shot reranking with dense encoder models for news background linking
PeerJ Computer Science
News linking
News recommendation
Text semantics
Ad-hoc retrieval
Semantic matching
title Zero-shot reranking with dense encoder models for news background linking
title_full Zero-shot reranking with dense encoder models for news background linking
title_fullStr Zero-shot reranking with dense encoder models for news background linking
title_full_unstemmed Zero-shot reranking with dense encoder models for news background linking
title_short Zero-shot reranking with dense encoder models for news background linking
title_sort zero shot reranking with dense encoder models for news background linking
topic News linking
News recommendation
Text semantics
Ad-hoc retrieval
Semantic matching
url https://peerj.com/articles/cs-2534.pdf
work_keys_str_mv AT marwaessam zeroshotrerankingwithdenseencodermodelsfornewsbackgroundlinking
AT tamerelsayed zeroshotrerankingwithdenseencodermodelsfornewsbackgroundlinking