Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorit...

Full description

Saved in:
Bibliographic Details
Main Authors: Petr Motlicek, Stefan Duffner, Danil Korchagin, Hervé Bourlard, Carl Scheffler, Jean-Marc Odobez, Giovanni Del Galdo, Markus Kallinger, Oliver Thiergart
Format: Article
Language:English
Published: Wiley 2013-01-01
Series:Advances in Multimedia
Online Access:http://dx.doi.org/10.1155/2013/175745
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832565077055635456
author Petr Motlicek
Stefan Duffner
Danil Korchagin
Hervé Bourlard
Carl Scheffler
Jean-Marc Odobez
Giovanni Del Galdo
Markus Kallinger
Oliver Thiergart
author_facet Petr Motlicek
Stefan Duffner
Danil Korchagin
Hervé Bourlard
Carl Scheffler
Jean-Marc Odobez
Giovanni Del Galdo
Markus Kallinger
Oliver Thiergart
author_sort Petr Motlicek
collection DOAJ
description We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.
format Article
id doaj-art-18b6c31c366447629c94b26145796946
institution Kabale University
issn 1687-5680
1687-5699
language English
publishDate 2013-01-01
publisher Wiley
record_format Article
series Advances in Multimedia
spelling doaj-art-18b6c31c366447629c94b261457969462025-02-03T01:09:27ZengWileyAdvances in Multimedia1687-56801687-56992013-01-01201310.1155/2013/175745175745Real-Time Audio-Visual Analysis for Multiperson VideoconferencingPetr Motlicek0Stefan Duffner1Danil Korchagin2Hervé Bourlard3Carl Scheffler4Jean-Marc Odobez5Giovanni Del Galdo6Markus Kallinger7Oliver Thiergart8Idiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandFraunhofer IIS, 91058 Erlangen, GermanyFraunhofer IIS, 91058 Erlangen, GermanyFraunhofer IIS, 91058 Erlangen, GermanyWe describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.http://dx.doi.org/10.1155/2013/175745
spellingShingle Petr Motlicek
Stefan Duffner
Danil Korchagin
Hervé Bourlard
Carl Scheffler
Jean-Marc Odobez
Giovanni Del Galdo
Markus Kallinger
Oliver Thiergart
Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
Advances in Multimedia
title Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
title_full Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
title_fullStr Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
title_full_unstemmed Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
title_short Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
title_sort real time audio visual analysis for multiperson videoconferencing
url http://dx.doi.org/10.1155/2013/175745
work_keys_str_mv AT petrmotlicek realtimeaudiovisualanalysisformultipersonvideoconferencing
AT stefanduffner realtimeaudiovisualanalysisformultipersonvideoconferencing
AT danilkorchagin realtimeaudiovisualanalysisformultipersonvideoconferencing
AT hervebourlard realtimeaudiovisualanalysisformultipersonvideoconferencing
AT carlscheffler realtimeaudiovisualanalysisformultipersonvideoconferencing
AT jeanmarcodobez realtimeaudiovisualanalysisformultipersonvideoconferencing
AT giovannidelgaldo realtimeaudiovisualanalysisformultipersonvideoconferencing
AT markuskallinger realtimeaudiovisualanalysisformultipersonvideoconferencing
AT oliverthiergart realtimeaudiovisualanalysisformultipersonvideoconferencing