Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorit...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2013-01-01
|
Series: | Advances in Multimedia |
Online Access: | http://dx.doi.org/10.1155/2013/175745 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832565077055635456 |
---|---|
author | Petr Motlicek Stefan Duffner Danil Korchagin Hervé Bourlard Carl Scheffler Jean-Marc Odobez Giovanni Del Galdo Markus Kallinger Oliver Thiergart |
author_facet | Petr Motlicek Stefan Duffner Danil Korchagin Hervé Bourlard Carl Scheffler Jean-Marc Odobez Giovanni Del Galdo Markus Kallinger Oliver Thiergart |
author_sort | Petr Motlicek |
collection | DOAJ |
description | We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario. |
format | Article |
id | doaj-art-18b6c31c366447629c94b26145796946 |
institution | Kabale University |
issn | 1687-5680 1687-5699 |
language | English |
publishDate | 2013-01-01 |
publisher | Wiley |
record_format | Article |
series | Advances in Multimedia |
spelling | doaj-art-18b6c31c366447629c94b261457969462025-02-03T01:09:27ZengWileyAdvances in Multimedia1687-56801687-56992013-01-01201310.1155/2013/175745175745Real-Time Audio-Visual Analysis for Multiperson VideoconferencingPetr Motlicek0Stefan Duffner1Danil Korchagin2Hervé Bourlard3Carl Scheffler4Jean-Marc Odobez5Giovanni Del Galdo6Markus Kallinger7Oliver Thiergart8Idiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandIdiap Research Institute, 1920 Martigny, SwitzerlandFraunhofer IIS, 91058 Erlangen, GermanyFraunhofer IIS, 91058 Erlangen, GermanyFraunhofer IIS, 91058 Erlangen, GermanyWe describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.http://dx.doi.org/10.1155/2013/175745 |
spellingShingle | Petr Motlicek Stefan Duffner Danil Korchagin Hervé Bourlard Carl Scheffler Jean-Marc Odobez Giovanni Del Galdo Markus Kallinger Oliver Thiergart Real-Time Audio-Visual Analysis for Multiperson Videoconferencing Advances in Multimedia |
title | Real-Time Audio-Visual Analysis for Multiperson Videoconferencing |
title_full | Real-Time Audio-Visual Analysis for Multiperson Videoconferencing |
title_fullStr | Real-Time Audio-Visual Analysis for Multiperson Videoconferencing |
title_full_unstemmed | Real-Time Audio-Visual Analysis for Multiperson Videoconferencing |
title_short | Real-Time Audio-Visual Analysis for Multiperson Videoconferencing |
title_sort | real time audio visual analysis for multiperson videoconferencing |
url | http://dx.doi.org/10.1155/2013/175745 |
work_keys_str_mv | AT petrmotlicek realtimeaudiovisualanalysisformultipersonvideoconferencing AT stefanduffner realtimeaudiovisualanalysisformultipersonvideoconferencing AT danilkorchagin realtimeaudiovisualanalysisformultipersonvideoconferencing AT hervebourlard realtimeaudiovisualanalysisformultipersonvideoconferencing AT carlscheffler realtimeaudiovisualanalysisformultipersonvideoconferencing AT jeanmarcodobez realtimeaudiovisualanalysisformultipersonvideoconferencing AT giovannidelgaldo realtimeaudiovisualanalysisformultipersonvideoconferencing AT markuskallinger realtimeaudiovisualanalysisformultipersonvideoconferencing AT oliverthiergart realtimeaudiovisualanalysisformultipersonvideoconferencing |