Zero-shot reranking with dense encoder models for news background linking
News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-01-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2534.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587207117897728 |
---|---|
author | Marwa Essam Tamer Elsayed |
author_facet | Marwa Essam Tamer Elsayed |
author_sort | Marwa Essam |
collection | DOAJ |
description | News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article’s vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article. |
format | Article |
id | doaj-art-4334cd3d57fa432da177a76a989adf51 |
institution | Kabale University |
issn | 2376-5992 |
language | English |
publishDate | 2025-01-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj-art-4334cd3d57fa432da177a76a989adf512025-01-24T15:05:13ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e253410.7717/peerj-cs.2534Zero-shot reranking with dense encoder models for news background linkingMarwa Essam0Tamer Elsayed1College of Engineering, Qatar University, Doha, QatarCollege of Engineering, Qatar University, Doha, QatarNews background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article’s vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article.https://peerj.com/articles/cs-2534.pdfNews linkingNews recommendationText semanticsAd-hoc retrievalSemantic matching |
spellingShingle | Marwa Essam Tamer Elsayed Zero-shot reranking with dense encoder models for news background linking PeerJ Computer Science News linking News recommendation Text semantics Ad-hoc retrieval Semantic matching |
title | Zero-shot reranking with dense encoder models for news background linking |
title_full | Zero-shot reranking with dense encoder models for news background linking |
title_fullStr | Zero-shot reranking with dense encoder models for news background linking |
title_full_unstemmed | Zero-shot reranking with dense encoder models for news background linking |
title_short | Zero-shot reranking with dense encoder models for news background linking |
title_sort | zero shot reranking with dense encoder models for news background linking |
topic | News linking News recommendation Text semantics Ad-hoc retrieval Semantic matching |
url | https://peerj.com/articles/cs-2534.pdf |
work_keys_str_mv | AT marwaessam zeroshotrerankingwithdenseencodermodelsfornewsbackgroundlinking AT tamerelsayed zeroshotrerankingwithdenseencodermodelsfornewsbackgroundlinking |