SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm

In this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distribut...

Full description

Saved in:
Bibliographic Details
Main Authors: Youngkuk Kim, Siwoon Son, Yang-Sae Moon
Format: Article
Language:English
Published: Wiley 2019-07-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147719862206
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849470256024649728
author Youngkuk Kim
Siwoon Son
Yang-Sae Moon
author_facet Youngkuk Kim
Siwoon Son
Yang-Sae Moon
author_sort Youngkuk Kim
collection DOAJ
description In this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distributed processing platform suitable for handling large data streams. Storm, however, must stop the entire work when it changes the input data structure or processing algorithm as it needs to modify, redistribute, and restart the programs. In addition, for effective data processing, we often use Storm with Kafka and databases, but it is difficult to use these platforms in an integrated manner. In this article, we derive the problems when applying sampling and filtering algorithms to Storm and propose a dynamic workflow management model that solves these problems. First, we present the concept of a plan consisting of input, processing, and output modules of a data stream. Second, we propose Storm Plan Manager, which can operate Storm, Kafka, and database as a single integrated system. Storm Plan Manager is an integrated workflow manager that dynamically controls sampling and filtering of data streams through plans. Third, as a key feature, Storm Plan Manager provides a Web client interface to visually create, execute, and monitor plans. In this article, we show the usefulness of the proposed Storm Plan Manager by presenting its design, implementation, and experimental results in order.
format Article
id doaj-art-d414cd4dd1894ec5830e35c206b23d0a
institution Kabale University
issn 1550-1477
language English
publishDate 2019-07-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-d414cd4dd1894ec5830e35c206b23d0a2025-08-20T03:25:12ZengWileyInternational Journal of Distributed Sensor Networks1550-14772019-07-011510.1177/1550147719862206SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache StormYoungkuk KimSiwoon SonYang-Sae MoonIn this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distributed processing platform suitable for handling large data streams. Storm, however, must stop the entire work when it changes the input data structure or processing algorithm as it needs to modify, redistribute, and restart the programs. In addition, for effective data processing, we often use Storm with Kafka and databases, but it is difficult to use these platforms in an integrated manner. In this article, we derive the problems when applying sampling and filtering algorithms to Storm and propose a dynamic workflow management model that solves these problems. First, we present the concept of a plan consisting of input, processing, and output modules of a data stream. Second, we propose Storm Plan Manager, which can operate Storm, Kafka, and database as a single integrated system. Storm Plan Manager is an integrated workflow manager that dynamically controls sampling and filtering of data streams through plans. Third, as a key feature, Storm Plan Manager provides a Web client interface to visually create, execute, and monitor plans. In this article, we show the usefulness of the proposed Storm Plan Manager by presenting its design, implementation, and experimental results in order.https://doi.org/10.1177/1550147719862206
spellingShingle Youngkuk Kim
Siwoon Son
Yang-Sae Moon
SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
International Journal of Distributed Sensor Networks
title SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
title_full SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
title_fullStr SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
title_full_unstemmed SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
title_short SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm
title_sort spmgr dynamic workflow manager for sampling and filtering data streams over apache storm
url https://doi.org/10.1177/1550147719862206
work_keys_str_mv AT youngkukkim spmgrdynamicworkflowmanagerforsamplingandfilteringdatastreamsoverapachestorm
AT siwoonson spmgrdynamicworkflowmanagerforsamplingandfilteringdatastreamsoverapachestorm
AT yangsaemoon spmgrdynamicworkflowmanagerforsamplingandfilteringdatastreamsoverapachestorm