Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 d...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2019-01-01
|
| Series: | Journal of Analytical Methods in Chemistry |
| Online Access: | http://dx.doi.org/10.1155/2019/1537568 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850216625574248448 |
|---|---|
| author | Xue-Zhen Hong Xian-Shu Fu Zheng-Liang Wang Li Zhang Xiao-Ping Yu Zi-Hong Ye |
| author_facet | Xue-Zhen Hong Xian-Shu Fu Zheng-Liang Wang Li Zhang Xiao-Ping Yu Zi-Hong Ye |
| author_sort | Xue-Zhen Hong |
| collection | DOAJ |
| description | This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate. |
| format | Article |
| id | doaj-art-365939c3ab27499aa4d34e0fbb6ec0b6 |
| institution | OA Journals |
| issn | 2090-8865 2090-8873 |
| language | English |
| publishDate | 2019-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Journal of Analytical Methods in Chemistry |
| spelling | doaj-art-365939c3ab27499aa4d34e0fbb6ec0b62025-08-20T02:08:15ZengWileyJournal of Analytical Methods in Chemistry2090-88652090-88732019-01-01201910.1155/2019/15375681537568Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling ApproachesXue-Zhen Hong0Xian-Shu Fu1Zheng-Liang Wang2Li Zhang3Xiao-Ping Yu4Zi-Hong Ye5College of Quality & Safety Engineering, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaDepartment of Computer Science, Zhejiang University, Hangzhou 310027, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaZhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, ChinaThis work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.http://dx.doi.org/10.1155/2019/1537568 |
| spellingShingle | Xue-Zhen Hong Xian-Shu Fu Zheng-Liang Wang Li Zhang Xiao-Ping Yu Zi-Hong Ye Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches Journal of Analytical Methods in Chemistry |
| title | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
| title_full | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
| title_fullStr | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
| title_full_unstemmed | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
| title_short | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
| title_sort | tracing geographical origins of teas based on ft nir spectroscopy introduction of model updating and imbalanced data handling approaches |
| url | http://dx.doi.org/10.1155/2019/1537568 |
| work_keys_str_mv | AT xuezhenhong tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT xianshufu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT zhengliangwang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT lizhang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT xiaopingyu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT zihongye tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches |