Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.
Moment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0320661 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849715765694955520 |
|---|---|
| author | Lingwen Meng Fangyuan Liu Mingyong Xin Siqi Guo Fu Zou |
| author_facet | Lingwen Meng Fangyuan Liu Mingyong Xin Siqi Guo Fu Zou |
| author_sort | Lingwen Meng |
| collection | DOAJ |
| description | Moment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale to this task due to their sophisticated network architecture. In this paper, we decompose the original problem into two mutually boosting subtasks: video retrieval from video collections and moment retrieval in a single video, and propose the coarse-to-fine alignment network (CFAN) including a video alignment module, a cross-modal interaction module and flow of multi-level coarse-to-fine alignment information. Through the interaction of the multi-level information from two subtasks, our method makes full use of the global contextual information in videos and the fine-grained alignment information between videos and queries. We perform sufficient experiments on three public datasets ActivityNet Captions, Charades-STA and DiDeMo and the evaluation results demonstrate the effectiveness of the proposed CFAN method. |
| format | Article |
| id | doaj-art-e8749761809f4191ade32ad103a6badb |
| institution | DOAJ |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-e8749761809f4191ade32ad103a6badb2025-08-20T03:13:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032066110.1371/journal.pone.0320661Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections.Lingwen MengFangyuan LiuMingyong XinSiqi GuoFu ZouMoment retrieval from large-scale video collections aims to search and localize the temporal boundary of a video moment from a collection of numerous videos according to the given natural language query. Existing methods for moment retrieval in a single video is too time-consuming to directly scale to this task due to their sophisticated network architecture. In this paper, we decompose the original problem into two mutually boosting subtasks: video retrieval from video collections and moment retrieval in a single video, and propose the coarse-to-fine alignment network (CFAN) including a video alignment module, a cross-modal interaction module and flow of multi-level coarse-to-fine alignment information. Through the interaction of the multi-level information from two subtasks, our method makes full use of the global contextual information in videos and the fine-grained alignment information between videos and queries. We perform sufficient experiments on three public datasets ActivityNet Captions, Charades-STA and DiDeMo and the evaluation results demonstrate the effectiveness of the proposed CFAN method.https://doi.org/10.1371/journal.pone.0320661 |
| spellingShingle | Lingwen Meng Fangyuan Liu Mingyong Xin Siqi Guo Fu Zou Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. PLoS ONE |
| title | Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. |
| title_full | Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. |
| title_fullStr | Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. |
| title_full_unstemmed | Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. |
| title_short | Needle in a haystack: Coarse-to-fine alignment network for moment retrieval from large-scale video collections. |
| title_sort | needle in a haystack coarse to fine alignment network for moment retrieval from large scale video collections |
| url | https://doi.org/10.1371/journal.pone.0320661 |
| work_keys_str_mv | AT lingwenmeng needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections AT fangyuanliu needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections AT mingyongxin needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections AT siqiguo needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections AT fuzou needleinahaystackcoarsetofinealignmentnetworkformomentretrievalfromlargescalevideocollections |