Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis

The mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the...

Full description

Saved in:
Bibliographic Details
Main Authors: Tao Pang, Haitao Zhang, Liliang Wen, Jun Tang, Bing Zhou, Qianxu Yang, Yong Li, Jiajun Wang, Aiming Chen, Zhongda Zeng
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Journal of Analytical Methods in Chemistry
Online Access:http://dx.doi.org/10.1155/2021/8874827
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849401683362185216
author Tao Pang
Haitao Zhang
Liliang Wen
Jun Tang
Bing Zhou
Qianxu Yang
Yong Li
Jiajun Wang
Aiming Chen
Zhongda Zeng
author_facet Tao Pang
Haitao Zhang
Liliang Wen
Jun Tang
Bing Zhou
Qianxu Yang
Yong Li
Jiajun Wang
Aiming Chen
Zhongda Zeng
author_sort Tao Pang
collection DOAJ
description The mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the difference and correlation of such information among different categories of objects/samples on the basis of PCA operation of a single targeted data. This helps to find the number of objects contained in the overlapping and nonoverlapping areas of ellipses obtained from PCA runs. Then, a quantitative evaluation index of correlation between data matrices was defined by comparing the PCA results of more than one data matrix. The similarity and difference between data matrices was further quantified through comprehensively analyzing the outcomes. Complicated data of tobacco agriculture were used as an example to illustrate the strategy of the proposed method, which includes rich features of climate, altitude, and chemical compositions of tobacco leaves. The number of objects of these data reached 171,516 with 14, 4, and 5 descriptors of climate, altitude, and chemicals, respectively. On the basis of the new method, the complex but weak relationship between these independent and dependent variables were interestingly studied. Three widely used but conventional methods were applied for comparison in this work. The results showed the power of the new method to discover the weak correlation between complicated data.
format Article
id doaj-art-87a37c0418e640459d6fcd8220808f8a
institution Kabale University
issn 2090-8865
2090-8873
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Journal of Analytical Methods in Chemistry
spelling doaj-art-87a37c0418e640459d6fcd8220808f8a2025-08-20T03:37:43ZengWileyJournal of Analytical Methods in Chemistry2090-88652090-88732021-01-01202110.1155/2021/88748278874827Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component AnalysisTao Pang0Haitao Zhang1Liliang Wen2Jun Tang3Bing Zhou4Qianxu Yang5Yong Li6Jiajun Wang7Aiming Chen8Zhongda Zeng9Yunnan Academy of Tobacco Agriculture Science, Yuxi, Yunnan 653100, ChinaChina Tobacco Yunnan Industrial Co., Ltd., Kunming, Yunnan 650202, ChinaDalian ChemDataSolution Information Technology Co. Ltd., Dalian 116023, ChinaChina Tobacco Yunnan Industrial Co., Ltd., Kunming, Yunnan 650202, ChinaChina Tobacco Yunnan Industrial Co., Ltd., Kunming, Yunnan 650202, ChinaChina Tobacco Yunnan Industrial Co., Ltd., Kunming, Yunnan 650202, ChinaYunnan Academy of Tobacco Agriculture Science, Yuxi, Yunnan 653100, ChinaChina Tobacco Yunnan Industrial Co., Ltd., Kunming, Yunnan 650202, ChinaDalian ChemDataSolution Information Technology Co. Ltd., Dalian 116023, ChinaDalian ChemDataSolution Information Technology Co. Ltd., Dalian 116023, ChinaThe mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the difference and correlation of such information among different categories of objects/samples on the basis of PCA operation of a single targeted data. This helps to find the number of objects contained in the overlapping and nonoverlapping areas of ellipses obtained from PCA runs. Then, a quantitative evaluation index of correlation between data matrices was defined by comparing the PCA results of more than one data matrix. The similarity and difference between data matrices was further quantified through comprehensively analyzing the outcomes. Complicated data of tobacco agriculture were used as an example to illustrate the strategy of the proposed method, which includes rich features of climate, altitude, and chemical compositions of tobacco leaves. The number of objects of these data reached 171,516 with 14, 4, and 5 descriptors of climate, altitude, and chemicals, respectively. On the basis of the new method, the complex but weak relationship between these independent and dependent variables were interestingly studied. Three widely used but conventional methods were applied for comparison in this work. The results showed the power of the new method to discover the weak correlation between complicated data.http://dx.doi.org/10.1155/2021/8874827
spellingShingle Tao Pang
Haitao Zhang
Liliang Wen
Jun Tang
Bing Zhou
Qianxu Yang
Yong Li
Jiajun Wang
Aiming Chen
Zhongda Zeng
Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
Journal of Analytical Methods in Chemistry
title Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
title_full Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
title_fullStr Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
title_full_unstemmed Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
title_short Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
title_sort quantitative analysis of a weak correlation between complicated data on the basis of principal component analysis
url http://dx.doi.org/10.1155/2021/8874827
work_keys_str_mv AT taopang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT haitaozhang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT liliangwen quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT juntang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT bingzhou quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT qianxuyang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT yongli quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT jiajunwang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT aimingchen quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis
AT zhongdazeng quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis