Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance

Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuyan Hao, Ting Xia, Ruizhi Zhang, Meng Guo
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-79126-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559512835686400
author Shuyan Hao
Ting Xia
Ruizhi Zhang
Meng Guo
author_facet Shuyan Hao
Ting Xia
Ruizhi Zhang
Meng Guo
author_sort Shuyan Hao
collection DOAJ
description Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset. By using local structure order parameters as a geometrical similarity metric, the similarity matrix including both compositional and geometrical similarities is calculated. Then all the Cu-S compounds are clustered into 86 groups using the similarity matrix and “density-based spatial clustering of applications with noise” (DBSCAN) algorithm. Some selected groups are analyzed using crystal structure visualization of hundreds of compounds, which provides chemical insights of the similarity metrics and shows the effectiveness of clustering. A group of rare earth containing layered Cu-S compounds is proposed for further experimental investigation as potential thermoelectric materials, based on a structure-property relationship consideration that similar structures tend to have similar properties. The unsupervised clustering approach in this work can be easily applied to other datasets, which will help for chemical understanding of the materials datasets and discover new materials with similarity properties based on the similarity metrics.
format Article
id doaj-art-b136f27a123b4c668660c93f0fe18102
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b136f27a123b4c668660c93f0fe181022025-01-05T12:27:24ZengNature PortfolioScientific Reports2045-23222024-12-011411810.1038/s41598-024-79126-3Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distanceShuyan Hao0Ting Xia1Ruizhi Zhang2Meng Guo3Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)Jinan Key Laboratory of High-Performance Industrial Software, Jinan Institute of Supercomputing TechnologyJinan Key Laboratory of High-Performance Industrial Software, Jinan Institute of Supercomputing TechnologyKey Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset. By using local structure order parameters as a geometrical similarity metric, the similarity matrix including both compositional and geometrical similarities is calculated. Then all the Cu-S compounds are clustered into 86 groups using the similarity matrix and “density-based spatial clustering of applications with noise” (DBSCAN) algorithm. Some selected groups are analyzed using crystal structure visualization of hundreds of compounds, which provides chemical insights of the similarity metrics and shows the effectiveness of clustering. A group of rare earth containing layered Cu-S compounds is proposed for further experimental investigation as potential thermoelectric materials, based on a structure-property relationship consideration that similar structures tend to have similar properties. The unsupervised clustering approach in this work can be easily applied to other datasets, which will help for chemical understanding of the materials datasets and discover new materials with similarity properties based on the similarity metrics.https://doi.org/10.1038/s41598-024-79126-3
spellingShingle Shuyan Hao
Ting Xia
Ruizhi Zhang
Meng Guo
Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
Scientific Reports
title Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
title_full Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
title_fullStr Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
title_full_unstemmed Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
title_short Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
title_sort clustering cu s based compounds using periodic table representation and compositional wasserstein distance
url https://doi.org/10.1038/s41598-024-79126-3
work_keys_str_mv AT shuyanhao clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance
AT tingxia clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance
AT ruizhizhang clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance
AT mengguo clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance