Video Scene Detection Using Compact Bag of Visual Word Models

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; gro...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Haroon, Junaid Baber, Ihsan Ullah, Sher Muhammad Daudpota, Maheen Bakhtyar, Varsha Devi
Format:	Article
Language:	English
Published:	Wiley 2018-01-01
Series:	Advances in Multimedia
Online Access:	http://dx.doi.org/10.1155/2018/2564963
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849473130580410368
author	Muhammad Haroon Junaid Baber Ihsan Ullah Sher Muhammad Daudpota Maheen Bakhtyar Varsha Devi
author_facet	Muhammad Haroon Junaid Baber Ihsan Ullah Sher Muhammad Daudpota Maheen Bakhtyar Varsha Devi
author_sort	Muhammad Haroon
collection	DOAJ
description	Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.
format	Article
id	doaj-art-d8dfb7cbc3194ad9a35481436cf8a204
institution	Kabale University
issn	1687-5680 1687-5699
language	English
publishDate	2018-01-01
publisher	Wiley
record_format	Article
series	Advances in Multimedia
spelling	doaj-art-d8dfb7cbc3194ad9a35481436cf8a2042025-08-20T03:24:16ZengWileyAdvances in Multimedia1687-56801687-56992018-01-01201810.1155/2018/25649632564963Video Scene Detection Using Compact Bag of Visual Word ModelsMuhammad Haroon0Junaid Baber1Ihsan Ullah2Sher Muhammad Daudpota3Maheen Bakhtyar4Varsha Devi5Department of Computer Science & IT, University of Balochistan, PakistanDepartment of Computer Science & IT, University of Balochistan, PakistanDepartment of Computer Science & IT, University of Balochistan, PakistanDepartment of Computer Science, Sukkur IBA University, PakistanDepartment of Computer Science & IT, University of Balochistan, PakistanDepartment of Computer Science, Sardar Bahadur Khan Women’s University, PakistanVideo segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.http://dx.doi.org/10.1155/2018/2564963
spellingShingle	Muhammad Haroon Junaid Baber Ihsan Ullah Sher Muhammad Daudpota Maheen Bakhtyar Varsha Devi Video Scene Detection Using Compact Bag of Visual Word Models Advances in Multimedia
title	Video Scene Detection Using Compact Bag of Visual Word Models
title_full	Video Scene Detection Using Compact Bag of Visual Word Models
title_fullStr	Video Scene Detection Using Compact Bag of Visual Word Models
title_full_unstemmed	Video Scene Detection Using Compact Bag of Visual Word Models
title_short	Video Scene Detection Using Compact Bag of Visual Word Models
title_sort	video scene detection using compact bag of visual word models
url	http://dx.doi.org/10.1155/2018/2564963
work_keys_str_mv	AT muhammadharoon videoscenedetectionusingcompactbagofvisualwordmodels AT junaidbaber videoscenedetectionusingcompactbagofvisualwordmodels AT ihsanullah videoscenedetectionusingcompactbagofvisualwordmodels AT shermuhammaddaudpota videoscenedetectionusingcompactbagofvisualwordmodels AT maheenbakhtyar videoscenedetectionusingcompactbagofvisualwordmodels AT varshadevi videoscenedetectionusingcompactbagofvisualwordmodels

Video Scene Detection Using Compact Bag of Visual Word Models

Similar Items