Storm-based distributed sampling system for multi-source stream environment

As a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of bi...

Full description

Saved in:
Bibliographic Details
Main Authors: Wonhyeong Cho, Myeong-Seon Gil, Mi-Jung Choi, Yang-Sae Moon
Format: Article
Language:English
Published: Wiley 2018-11-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147718812698
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547197916282880
author Wonhyeong Cho
Myeong-Seon Gil
Mi-Jung Choi
Yang-Sae Moon
author_facet Wonhyeong Cho
Myeong-Seon Gil
Mi-Jung Choi
Yang-Sae Moon
author_sort Wonhyeong Cho
collection DOAJ
description As a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of binary Bernoulli sampling in the multi-source stream environment. Binary Bernoulli sampling has the n :1 structure where n sites transmit data to 1 coordinator. However, as the number of sites increases or the input stream explosively increases, the binary Bernoulli sampling may cause a severe bottleneck in the coordinator. In addition, bidirectional communication over different networks among the coordinator and sites may incur excessive communication overhead. In this article, we propose a novel distributed processing model of binary Bernoulli sampling to solve these coordinator bottleneck and communication overhead problems. We first present a multiple-coordinator structure to solve the coordinator bottleneck. We then present a new sampling model with an integrated framework and shared memory to alleviate the communication overhead. To verify the effectiveness and scalability of the proposed model, we perform its actual implementation in Apache Storm, a real-time distributed stream processing system. Experimental results show that our Storm-based binary Bernoulli sampling improves performance by up to 1.8 times compared with the legacy method and maintains high performance even when the input stream largely increases. These results indicate that the proposed distributed processing model is an excellent approach that solves the performance degradation problem of binary Bernoulli sampling and verifies its superiority through the actual implementation on Apache Storm.
format Article
id doaj-art-0da45f8d49474ed5a169259fd2e9a5a5
institution Kabale University
issn 1550-1477
language English
publishDate 2018-11-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-0da45f8d49474ed5a169259fd2e9a5a52025-02-03T06:45:32ZengWileyInternational Journal of Distributed Sensor Networks1550-14772018-11-011410.1177/1550147718812698Storm-based distributed sampling system for multi-source stream environmentWonhyeong ChoMyeong-Seon GilMi-Jung ChoiYang-Sae MoonAs a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of binary Bernoulli sampling in the multi-source stream environment. Binary Bernoulli sampling has the n :1 structure where n sites transmit data to 1 coordinator. However, as the number of sites increases or the input stream explosively increases, the binary Bernoulli sampling may cause a severe bottleneck in the coordinator. In addition, bidirectional communication over different networks among the coordinator and sites may incur excessive communication overhead. In this article, we propose a novel distributed processing model of binary Bernoulli sampling to solve these coordinator bottleneck and communication overhead problems. We first present a multiple-coordinator structure to solve the coordinator bottleneck. We then present a new sampling model with an integrated framework and shared memory to alleviate the communication overhead. To verify the effectiveness and scalability of the proposed model, we perform its actual implementation in Apache Storm, a real-time distributed stream processing system. Experimental results show that our Storm-based binary Bernoulli sampling improves performance by up to 1.8 times compared with the legacy method and maintains high performance even when the input stream largely increases. These results indicate that the proposed distributed processing model is an excellent approach that solves the performance degradation problem of binary Bernoulli sampling and verifies its superiority through the actual implementation on Apache Storm.https://doi.org/10.1177/1550147718812698
spellingShingle Wonhyeong Cho
Myeong-Seon Gil
Mi-Jung Choi
Yang-Sae Moon
Storm-based distributed sampling system for multi-source stream environment
International Journal of Distributed Sensor Networks
title Storm-based distributed sampling system for multi-source stream environment
title_full Storm-based distributed sampling system for multi-source stream environment
title_fullStr Storm-based distributed sampling system for multi-source stream environment
title_full_unstemmed Storm-based distributed sampling system for multi-source stream environment
title_short Storm-based distributed sampling system for multi-source stream environment
title_sort storm based distributed sampling system for multi source stream environment
url https://doi.org/10.1177/1550147718812698
work_keys_str_mv AT wonhyeongcho stormbaseddistributedsamplingsystemformultisourcestreamenvironment
AT myeongseongil stormbaseddistributedsamplingsystemformultisourcestreamenvironment
AT mijungchoi stormbaseddistributedsamplingsystemformultisourcestreamenvironment
AT yangsaemoon stormbaseddistributedsamplingsystemformultisourcestreamenvironment