Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance
Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-12-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-024-79126-3 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559512835686400 |
---|---|
author | Shuyan Hao Ting Xia Ruizhi Zhang Meng Guo |
author_facet | Shuyan Hao Ting Xia Ruizhi Zhang Meng Guo |
author_sort | Shuyan Hao |
collection | DOAJ |
description | Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset. By using local structure order parameters as a geometrical similarity metric, the similarity matrix including both compositional and geometrical similarities is calculated. Then all the Cu-S compounds are clustered into 86 groups using the similarity matrix and “density-based spatial clustering of applications with noise” (DBSCAN) algorithm. Some selected groups are analyzed using crystal structure visualization of hundreds of compounds, which provides chemical insights of the similarity metrics and shows the effectiveness of clustering. A group of rare earth containing layered Cu-S compounds is proposed for further experimental investigation as potential thermoelectric materials, based on a structure-property relationship consideration that similar structures tend to have similar properties. The unsupervised clustering approach in this work can be easily applied to other datasets, which will help for chemical understanding of the materials datasets and discover new materials with similarity properties based on the similarity metrics. |
format | Article |
id | doaj-art-b136f27a123b4c668660c93f0fe18102 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2024-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-b136f27a123b4c668660c93f0fe181022025-01-05T12:27:24ZengNature PortfolioScientific Reports2045-23222024-12-011411810.1038/s41598-024-79126-3Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distanceShuyan Hao0Ting Xia1Ruizhi Zhang2Meng Guo3Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)Jinan Key Laboratory of High-Performance Industrial Software, Jinan Institute of Supercomputing TechnologyJinan Key Laboratory of High-Performance Industrial Software, Jinan Institute of Supercomputing TechnologyKey Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputing Center in Jinan), Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)Abstract Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover’s distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset. By using local structure order parameters as a geometrical similarity metric, the similarity matrix including both compositional and geometrical similarities is calculated. Then all the Cu-S compounds are clustered into 86 groups using the similarity matrix and “density-based spatial clustering of applications with noise” (DBSCAN) algorithm. Some selected groups are analyzed using crystal structure visualization of hundreds of compounds, which provides chemical insights of the similarity metrics and shows the effectiveness of clustering. A group of rare earth containing layered Cu-S compounds is proposed for further experimental investigation as potential thermoelectric materials, based on a structure-property relationship consideration that similar structures tend to have similar properties. The unsupervised clustering approach in this work can be easily applied to other datasets, which will help for chemical understanding of the materials datasets and discover new materials with similarity properties based on the similarity metrics.https://doi.org/10.1038/s41598-024-79126-3 |
spellingShingle | Shuyan Hao Ting Xia Ruizhi Zhang Meng Guo Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance Scientific Reports |
title | Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance |
title_full | Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance |
title_fullStr | Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance |
title_full_unstemmed | Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance |
title_short | Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance |
title_sort | clustering cu s based compounds using periodic table representation and compositional wasserstein distance |
url | https://doi.org/10.1038/s41598-024-79126-3 |
work_keys_str_mv | AT shuyanhao clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance AT tingxia clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance AT ruizhizhang clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance AT mengguo clusteringcusbasedcompoundsusingperiodictablerepresentationandcompositionalwassersteindistance |