A Gene Clustering Algorithm Based on the CCA-Hierarchical Clustering

Aiming at the massive gene expression data brought by gene chip technology , in order to fully mine the biological information and potential biological mechanisms contained in it , this paper proposes a gene clustering algorithm based on CCA- hierarchical clustering ( CCA-Hc) . The alg...

Full description

Saved in:
Bibliographic Details
Main Author: LIN Qianmin
Format: Article
Language:zho
Published: Harbin University of Science and Technology Publications 2023-10-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2261
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Aiming at the massive gene expression data brought by gene chip technology , in order to fully mine the biological information and potential biological mechanisms contained in it , this paper proposes a gene clustering algorithm based on CCA- hierarchical clustering ( CCA-Hc) . The algorithm introduces canonical correlation analysis on the basis of hierarchical clustering , and optimizes the calculation method of similarity matrix. First , the canonical correlation analysis method is used to measure the gene correlation by combining the multiple feature information of the gene , and the gene similarity matrix is obtained. Then the similarity matrix is used as the neighbor matrix of hierarchical clustering for agglomerative hierarchical clustering. The CCA-Hc clustering effect test experiment was performed on the gene expression dataset of Oryza sativa L. ( rice ) . The results show that , compared with the traditional hierarchical clustering algorithm using Euclidean distance ( EUC-Hc ) , CCA-Hc is superior to EUC-Hc in both internal stability index and biological functional index , and has better robustness and clustering accuracy. It is more conducive to discovering the co-expression relationship between genes.
ISSN:1007-2683