Audio-visual event localization with dual temporal-aware scene understanding and image-text knowledge bridging

Audio-visual event localization with dual temporal-aware scene understanding and image-text knowledge bridging

Abstract Audio-visual event localization (AVEL) task aims to judge and classify an audible and visible event. Existing methods devote to this goal by transferring pre-trained knowledge as well as understanding temporal dependencies and cross-modal correlations of the audio-visual scene. However, mos...

Full description

Saved in:

Bibliographic Details
Main Authors:	Pufen Zhang, Jiaxiang Wang, Meng Wan, Song Zhang, Jie Jing, Lianhong Ding, Peng Shi
Format:	Article
Language:	English
Published:	Springer 2024-11-01
Series:	Complex & Intelligent Systems
Subjects:	Audio-visual event localization Multi-modal learning Video scene understanding Knowledge transfer
Online Access:	https://doi.org/10.1007/s40747-024-01654-2
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Audio-Language Datasets of Scenes and Events: A Survey
by: Gijs Wijngaard, et al.
Published: (2025-01-01)

Deep Learning for Traffic Scene Understanding: A Review
by: Parya Dolatyabi, et al.
Published: (2025-01-01)

Video or audio listening tests for English language teaching context: which is more effective for classroom use?
by: Clara Herlina Karjo, et al.
Published: (2022-02-01)

A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
by: Mingzhu Liu, et al.
Published: (2025-01-01)

Spatial frequency preferences of representations of indoor and natural scene categories in scene-selective regions under different conditions of contrast
by: Yuanyuan Zhang, et al.
Published: (2025-02-01)

Peningkatan Kedisiplinan Siswa Sekolah Dasar Melalui Pemanfaatan Media Audio Visual
by: Siti Diyah Rachmatika, et al.
Published: (2024-09-01)

Deep convolutional neural networks for double compressed AMR audio detection
by: Aykut Büker, et al.
Published: (2021-06-01)

A Novel Audio Copy Move Forgery Detection Method With Classification of Graph-Based Representations
by: Beste Ustubioglu, et al.
Published: (2025-01-01)

Audiogmenter: a MATLAB toolbox for audio data augmentation
by: Gianluca Maguolo, et al.
Published: (2025-01-01)

Interior design assistant algorithm based on indoor scene analysis
by: Lu Zhang
Published: (2025-12-01)

Design of an Integrated Model for Video Summarization Using Multimodal Fusion and YOLO for Crime Scene Analysis
by: Sai Babu Veesam, et al.
Published: (2025-01-01)

Correlation-guided decoding strategy for low-resource Uyghur scene text recognition
by: Miaomiao Xu, et al.
Published: (2024-11-01)

Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
by: Xinhe Kuang, et al.
Published: (2024-12-01)

Authenticity at Risk: Key Factors in the Generation and Detection of Audio Deepfakes
by: Alba Martínez-Serrano, et al.
Published: (2025-01-01)

Advancements in End-to-End Audio Style Transformation: A Differentiable Approach for Voice Conversion and Musical Style Transfer
by: Shashwat Aggarwal, et al.
Published: (2025-01-01)

Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
by: Andre O. Francani, et al.
Published: (2025-01-01)

PENGGUNAAN MEDIA AUDIO VISUAL PADA MATA PELAJARAN PENDIDIKAN AGAMA ISLAM UNTUK MENINGKATKAN AKTIVITAS BELAJAR SISWA KELAS V SD N 09 PALEMBANG
by: Ibrahim Ibrahim, et al.
Published: (2024-01-01)

The Preferred User: How Audio Description could Change Understandings of Australian Television Audiences and Media Technology
by: Ellis Katie, et al.
Published: (2018-07-01)

Obscene beasts: the stage behind the scenes in A Midsummer Night’s Dream
by: Mathilde La Cassagnère
Published: (2016-06-01)

Audio classification using grasshopper‐ride optimization algorithm‐based support vector machine
by: Suryabhan Pratap Singh, et al.
Published: (2021-08-01)

Learning through audio-visual aids: how does it work for students to delve into the English vowels?
by: Zikril Mulia
Published: (2022-11-01)

Pengaruh Bimbingan Kelompok dengan Media Audio Visual terhadap Motivasi Belajar Siswa di MTs Negeri 4 Jakarta
by: Rianka Anindya Rahmadhita, et al.
Published: (2023-07-01)

Apk2Audio4AndMal: Audio Based Malware Family Detection Framework
by: Oguz Emre Kural, et al.
Published: (2023-01-01)

Position Determination for Dynamic Scenes With Unsynchronized Image Sequences
by: Kai Guo, et al.
Published: (2025-01-01)

Spatial audio signal processing for augmented telepresence applications
by: Thomas Deppisch
Published: (2025-03-01)

Arabic Temporal Common Sense Understanding
by: Reem Alqifari, et al.
Published: (2024-12-01)

An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
by: Yifei Zhang
Published: (2025-02-01)

Dual-Channel Deepfake Audio Detection: Leveraging Direct and Reverberant Waveforms
by: Gunwoo Lee, et al.
Published: (2025-01-01)

Frequency and Texture Aware Multi-Domain Feature Fusion for Remote Sensing Scene Classification
by: Russo Ashraf, et al.
Published: (2025-01-01)

Evaluation of Landscapes and Soundscapes in Traditional Villages in the Hakka Region of Guangdong Province Based on Audio-Visual Interactions
by: Dongxu Zhang, et al.
Published: (2025-01-01)

Performance Evaluation and Optimization of 3D Gaussian Splatting in Indoor Scene Generation and Rendering
by: Xinjian Fang, et al.
Published: (2025-01-01)

Mosaic-Mixed Attention-Based Unexpected Traffic Scene Classification
by: Sang-Hyun Lee, et al.
Published: (2025-01-01)

From Book to Playlist: How Open-Access Audio Archives are Renewing the Poetry Collection
by: Abigail Lang
Published: (2022-12-01)

Learning with semantic ambiguity for unbiased scene graph generation
by: Shanjin Zhong, et al.
Published: (2025-01-01)

The SPN Network for Digital Audio Data Based on Elliptic Curve Over a Finite Field
by: Ijaz Khalid, et al.
Published: (2022-01-01)

Accelerated Reconstruction of Scenes Using CUDA-Based Parallel Computing
by: Gui Zou, et al.
Published: (2025-01-01)

Bridging the gap: multi-granularity representation learning for text-based vehicle retrieval
by: Xue Bo, et al.
Published: (2024-11-01)

Zero-Shot Remote Sensing Scene Classification Based on Automatic Knowledge Graph and Dual-Branch Semantic Correlation Supervision
by: Chao Wang, et al.
Published: (2025-01-01)

Digital technologies in music education. Using Digital Audio Workstations (DAW) with Project-Based Learning (PBL)
by: María Elena Cuenca-Rodríguez, et al.
Published: (2025-01-01)

Integrated high-precision real scene 3D modeling of karst cave landscape based on laser scanning and photogrammetry
by: Congyuan Zhang, et al.
Published: (2024-09-01)