Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data
Smart cities deploy various sensors such as microphones and RGB cameras to collect data to improve the safety and comfort of the citizens. As data annotation is expensive, self-supervised methods such as contrastive learning are used to learn audio-visual representations for downstream tasks. Focusi...
Saved in:
Main Authors: | Wei-Cheng Wang, Sander De Coninck, Sam Leroux, Pieter Simoens |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Robotics and AI |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frobt.2024.1490718/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
PENGGUNAAN MEDIA AUDIO VISUAL PADA MATA PELAJARAN PENDIDIKAN AGAMA ISLAM UNTUK MENINGKATKAN AKTIVITAS BELAJAR SISWA KELAS V SD N 09 PALEMBANG
by: Ibrahim Ibrahim, et al.
Published: (2024-01-01) -
Audio-Language Datasets of Scenes and Events: A Survey
by: Gijs Wijngaard, et al.
Published: (2025-01-01) -
Peningkatan Kedisiplinan Siswa Sekolah Dasar Melalui Pemanfaatan Media Audio Visual
by: Siti Diyah Rachmatika, et al.
Published: (2024-09-01) -
Multimodal MRI analysis of microstructural and functional connectivity brain changes following systematic audio-visual training in a virtual environment
by: Kholoud Alwashmi, et al.
Published: (2025-01-01) -
Learning through audio-visual aids: how does it work for students to delve into the English vowels?
by: Zikril Mulia
Published: (2022-11-01)