Data augmentation for dense passage retrieval using corpus-passage frequency-based token deletion

Abstract This paper proposes a novel data augmentation method to address class imbalance in large-scale information retrieval systems. In particular, a corpus-passage frequency-based token deletion technique is introduced to improve the accuracy of Dense Passage Retrieval, which is a dense vector-ba...

Full description

Saved in:
Bibliographic Details
Main Authors: A-Seong Moon, Kyumin Kim, Jaesung Lee
Format: Article
Language:English
Published: SpringerOpen 2025-08-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01257-9
Tags: Add Tag
No Tags, Be the first to tag this record!