Variable size sampling to support high uniformity confidence in sensor data streams

In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random samp...

Full description

Saved in:
Bibliographic Details
Main Authors: Hajin Kim, Myeong-Seon Gil, Yang-Sae Moon, Mi-Jung Choi
Format: Article
Language:English
Published: Wiley 2018-04-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147718773999
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547870290477056
author Hajin Kim
Myeong-Seon Gil
Yang-Sae Moon
Mi-Jung Choi
author_facet Hajin Kim
Myeong-Seon Gil
Yang-Sae Moon
Mi-Jung Choi
author_sort Hajin Kim
collection DOAJ
description In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance , and the continuous degradation by the sampling range increase . For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample , which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.
format Article
id doaj-art-c184816495694eefae602620bfad762e
institution Kabale University
issn 1550-1477
language English
publishDate 2018-04-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-c184816495694eefae602620bfad762e2025-02-03T06:43:04ZengWileyInternational Journal of Distributed Sensor Networks1550-14772018-04-011410.1177/1550147718773999Variable size sampling to support high uniformity confidence in sensor data streamsHajin KimMyeong-Seon GilYang-Sae MoonMi-Jung ChoiIn order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance , and the continuous degradation by the sampling range increase . For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample , which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence.https://doi.org/10.1177/1550147718773999
spellingShingle Hajin Kim
Myeong-Seon Gil
Yang-Sae Moon
Mi-Jung Choi
Variable size sampling to support high uniformity confidence in sensor data streams
International Journal of Distributed Sensor Networks
title Variable size sampling to support high uniformity confidence in sensor data streams
title_full Variable size sampling to support high uniformity confidence in sensor data streams
title_fullStr Variable size sampling to support high uniformity confidence in sensor data streams
title_full_unstemmed Variable size sampling to support high uniformity confidence in sensor data streams
title_short Variable size sampling to support high uniformity confidence in sensor data streams
title_sort variable size sampling to support high uniformity confidence in sensor data streams
url https://doi.org/10.1177/1550147718773999
work_keys_str_mv AT hajinkim variablesizesamplingtosupporthighuniformityconfidenceinsensordatastreams
AT myeongseongil variablesizesamplingtosupporthighuniformityconfidenceinsensordatastreams
AT yangsaemoon variablesizesamplingtosupporthighuniformityconfidenceinsensordatastreams
AT mijungchoi variablesizesamplingtosupporthighuniformityconfidenceinsensordatastreams