Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
Data quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) datab...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-01-01
|
Series: | Journal of Advanced Transportation |
Online Access: | http://dx.doi.org/10.1155/2021/5283283 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832549798970916864 |
---|---|
author | Pengfei Zhang Zhenliang Ma Xiaoxiong Weng |
author_facet | Pengfei Zhang Zhenliang Ma Xiaoxiong Weng |
author_sort | Pengfei Zhang |
collection | DOAJ |
description | Data quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) database caused by the erroneous association between the fare machines and metro stations, e.g., a fare machine located at Station A is wrongly associated with Station B in the AFC database. It could lead to inappropriate fare charges in a distance-based fare system and cause analysis bias for planning/operation practice. We propose a tensor decomposition and isolation forest-based approach to detect and correct the invalid associated fare machines in the system. The tensor decomposition extracts features of passenger flows and travel times passing through fare machines. The isolation forest coupled with a neural network (NN) takes these features as inputs to detect the wrongly associated fare machines and infer the correct association stations. Case studies using data from a metro system show that the proposed detection approach achieves over 90% accuracy in detecting the invalid associations for up to 35% invalid associations. The inferred association has a 90% accuracy even when the invalid association ratio reaches 40%. The proposed data-driven invalid data detection method is useful for large-scale data management in terms of data quality check and fix. |
format | Article |
id | doaj-art-67d97a927ecd41a1899ba1d6df55c350 |
institution | Kabale University |
issn | 0197-6729 2042-3195 |
language | English |
publishDate | 2021-01-01 |
publisher | Wiley |
record_format | Article |
series | Journal of Advanced Transportation |
spelling | doaj-art-67d97a927ecd41a1899ba1d6df55c3502025-02-03T06:08:33ZengWileyJournal of Advanced Transportation0197-67292042-31952021-01-01202110.1155/2021/52832835283283Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card DataPengfei Zhang0Zhenliang Ma1Xiaoxiong Weng2School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510000, ChinaInstitute of Transport Studies, Department of Civil Engineering, Monash University, Clayton, VIC 3168, AustraliaSchool of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510000, ChinaData quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) database caused by the erroneous association between the fare machines and metro stations, e.g., a fare machine located at Station A is wrongly associated with Station B in the AFC database. It could lead to inappropriate fare charges in a distance-based fare system and cause analysis bias for planning/operation practice. We propose a tensor decomposition and isolation forest-based approach to detect and correct the invalid associated fare machines in the system. The tensor decomposition extracts features of passenger flows and travel times passing through fare machines. The isolation forest coupled with a neural network (NN) takes these features as inputs to detect the wrongly associated fare machines and infer the correct association stations. Case studies using data from a metro system show that the proposed detection approach achieves over 90% accuracy in detecting the invalid associations for up to 35% invalid associations. The inferred association has a 90% accuracy even when the invalid association ratio reaches 40%. The proposed data-driven invalid data detection method is useful for large-scale data management in terms of data quality check and fix.http://dx.doi.org/10.1155/2021/5283283 |
spellingShingle | Pengfei Zhang Zhenliang Ma Xiaoxiong Weng Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data Journal of Advanced Transportation |
title | Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data |
title_full | Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data |
title_fullStr | Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data |
title_full_unstemmed | Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data |
title_short | Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data |
title_sort | detecting invalid associations between fare machines and metro stations using smart card data |
url | http://dx.doi.org/10.1155/2021/5283283 |
work_keys_str_mv | AT pengfeizhang detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata AT zhenliangma detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata AT xiaoxiongweng detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata |