Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data

Data quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) datab...

Full description

Saved in:
Bibliographic Details
Main Authors: Pengfei Zhang, Zhenliang Ma, Xiaoxiong Weng
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Journal of Advanced Transportation
Online Access:http://dx.doi.org/10.1155/2021/5283283
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832549798970916864
author Pengfei Zhang
Zhenliang Ma
Xiaoxiong Weng
author_facet Pengfei Zhang
Zhenliang Ma
Xiaoxiong Weng
author_sort Pengfei Zhang
collection DOAJ
description Data quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) database caused by the erroneous association between the fare machines and metro stations, e.g., a fare machine located at Station A is wrongly associated with Station B in the AFC database. It could lead to inappropriate fare charges in a distance-based fare system and cause analysis bias for planning/operation practice. We propose a tensor decomposition and isolation forest-based approach to detect and correct the invalid associated fare machines in the system. The tensor decomposition extracts features of passenger flows and travel times passing through fare machines. The isolation forest coupled with a neural network (NN) takes these features as inputs to detect the wrongly associated fare machines and infer the correct association stations. Case studies using data from a metro system show that the proposed detection approach achieves over 90% accuracy in detecting the invalid associations for up to 35% invalid associations. The inferred association has a 90% accuracy even when the invalid association ratio reaches 40%. The proposed data-driven invalid data detection method is useful for large-scale data management in terms of data quality check and fix.
format Article
id doaj-art-67d97a927ecd41a1899ba1d6df55c350
institution Kabale University
issn 0197-6729
2042-3195
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Journal of Advanced Transportation
spelling doaj-art-67d97a927ecd41a1899ba1d6df55c3502025-02-03T06:08:33ZengWileyJournal of Advanced Transportation0197-67292042-31952021-01-01202110.1155/2021/52832835283283Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card DataPengfei Zhang0Zhenliang Ma1Xiaoxiong Weng2School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510000, ChinaInstitute of Transport Studies, Department of Civil Engineering, Monash University, Clayton, VIC 3168, AustraliaSchool of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510000, ChinaData quality is essential for its authentic usage in analysis and applications. The large volume of automated collection data inevidently suffers from data quality issues including data missing and invalidity. This paper deals with an invalid data problem in the automated fare collection (AFC) database caused by the erroneous association between the fare machines and metro stations, e.g., a fare machine located at Station A is wrongly associated with Station B in the AFC database. It could lead to inappropriate fare charges in a distance-based fare system and cause analysis bias for planning/operation practice. We propose a tensor decomposition and isolation forest-based approach to detect and correct the invalid associated fare machines in the system. The tensor decomposition extracts features of passenger flows and travel times passing through fare machines. The isolation forest coupled with a neural network (NN) takes these features as inputs to detect the wrongly associated fare machines and infer the correct association stations. Case studies using data from a metro system show that the proposed detection approach achieves over 90% accuracy in detecting the invalid associations for up to 35% invalid associations. The inferred association has a 90% accuracy even when the invalid association ratio reaches 40%. The proposed data-driven invalid data detection method is useful for large-scale data management in terms of data quality check and fix.http://dx.doi.org/10.1155/2021/5283283
spellingShingle Pengfei Zhang
Zhenliang Ma
Xiaoxiong Weng
Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
Journal of Advanced Transportation
title Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
title_full Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
title_fullStr Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
title_full_unstemmed Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
title_short Detecting Invalid Associations between Fare Machines and Metro Stations Using Smart Card Data
title_sort detecting invalid associations between fare machines and metro stations using smart card data
url http://dx.doi.org/10.1155/2021/5283283
work_keys_str_mv AT pengfeizhang detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata
AT zhenliangma detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata
AT xiaoxiongweng detectinginvalidassociationsbetweenfaremachinesandmetrostationsusingsmartcarddata