Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 d...

Full description

Saved in:
Bibliographic Details
Main Authors: Xue-Zhen Hong, Xian-Shu Fu, Zheng-Liang Wang, Li Zhang, Xiao-Ping Yu, Zi-Hong Ye
Format: Article
Language:English
Published: Wiley 2019-01-01
Series:Journal of Analytical Methods in Chemistry
Online Access:http://dx.doi.org/10.1155/2019/1537568
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850216625574248448
author Xue-Zhen Hong
Xian-Shu Fu
Zheng-Liang Wang
Li Zhang
Xiao-Ping Yu
Zi-Hong Ye
author_facet Xue-Zhen Hong
Xian-Shu Fu
Zheng-Liang Wang
Li Zhang
Xiao-Ping Yu
Zi-Hong Ye
author_sort Xue-Zhen Hong
collection DOAJ
description This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.
format Article
id doaj-art-365939c3ab27499aa4d34e0fbb6ec0b6
institution OA Journals
issn 2090-8865
2090-8873
language English
publishDate 2019-01-01
publisher Wiley
record_format Article
series Journal of Analytical Methods in Chemistry
spelling doaj-art-365939c3ab27499aa4d34e0fbb6ec0b62025-08-20T02:08:15ZengWileyJournal of Analytical Methods in Chemistry2090-88652090-88732019-01-01201910.1155/2019/15375681537568Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling ApproachesXue-Zhen Hong0Xian-Shu Fu1Zheng-Liang Wang2Li Zhang3Xiao-Ping Yu4Zi-Hong Ye5College of Quality & Safety Engineering, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaDepartment of Computer Science, Zhejiang University, Hangzhou 310027, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaThis work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.http://dx.doi.org/10.1155/2019/1537568
spellingShingle Xue-Zhen Hong
Xian-Shu Fu
Zheng-Liang Wang
Li Zhang
Xiao-Ping Yu
Zi-Hong Ye
Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
Journal of Analytical Methods in Chemistry
title Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_full Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_fullStr Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_full_unstemmed Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_short Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_sort tracing geographical origins of teas based on ft nir spectroscopy introduction of model updating and imbalanced data handling approaches
url http://dx.doi.org/10.1155/2019/1537568
work_keys_str_mv AT xuezhenhong tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT xianshufu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT zhengliangwang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT lizhang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT xiaopingyu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT zihongye tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches