Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.

Moment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingwen Meng, Fangyuan Liu, Mingyong Xin, Siqi Guo, Fu Zou
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0320661
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849715765694955520
author Lingwen Meng
Fangyuan Liu
Mingyong Xin
Siqi Guo
Fu Zou
author_facet Lingwen Meng
Fangyuan Liu
Mingyong Xin
Siqi Guo
Fu Zou
author_sort Lingwen Meng
collection DOAJ
description Moment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale to this task due to their sophisticated network architecture. In this paper, we decompose the original problem into two mutually boosting subtasks: video retrieval from video collections and moment retrieval in a single video, and propose the coarse-to-fine alignment network (CFAN) including a video alignment module, a cross-modal interaction module and flow of multi-level coarse-to-fine alignment information. Through the interaction of the multi-level information from two subtasks, our method makes full use of the global contextual information in videos and the fine-grained alignment information between videos and queries. We perform sufficient experiments on three public datasets ActivityNet Captions, Charades-STA and DiDeMo and the evaluation results demonstrate the effectiveness of the proposed CFAN method.
format Article
id doaj-art-e8749761809f4191ade32ad103a6badb
institution DOAJ
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-e8749761809f4191ade32ad103a6badb2025-08-20T03:13:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032066110.1371/journal.pone.0320661Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.Lingwen MengFangyuan LiuMingyong XinSiqi GuoFu ZouMoment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale to this task due to their sophisticated network architecture. In this paper, we decompose the original problem into two mutually boosting subtasks: video retrieval from video collections and moment retrieval in a single video, and propose the coarse-to-fine alignment network (CFAN) including a video alignment module, a cross-modal interaction module and flow of multi-level coarse-to-fine alignment information. Through the interaction of the multi-level information from two subtasks, our method makes full use of the global contextual information in videos and the fine-grained alignment information between videos and queries. We perform sufficient experiments on three public datasets ActivityNet Captions, Charades-STA and DiDeMo and the evaluation results demonstrate the effectiveness of the proposed CFAN method.https://doi.org/10.1371/journal.pone.0320661
spellingShingle Lingwen Meng
Fangyuan Liu
Mingyong Xin
Siqi Guo
Fu Zou
Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
PLoS ONE
title Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
title_full Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
title_fullStr Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
title_full_unstemmed Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
title_short Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
title_sort needle in a haystack coarse to fine alignment network for moment retrieval from large scale video collections
url https://doi.org/10.1371/journal.pone.0320661
work_keys_str_mv AT lingwenmeng needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections
AT fangyuanliu needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections
AT mingyongxin needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections
AT siqiguo needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections
AT fuzou needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections