JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data proces...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11127057/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849223107809640448 |
|---|---|
| author | Ziyi Zhang Teng Lv |
| author_facet | Ziyi Zhang Teng Lv |
| author_sort | Ziyi Zhang |
| collection | DOAJ |
| description | The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data processing, data management, and data integration. Currently, most research focuses on identifying the global schema of JSON data sets, unable to describe the detailed structure of data, which increases the implementation and maintenance costs. To solve this problem, this paper proposes an improved method called JSON Schema Variant Extraction (JSVE), which extracts precise JSON schema variants from datasets based on clustering and formal concept analysis. As complex structures of large, heterogeneous JSON data cannot be analyzed directly, JSVE solves this limitation by flattening field names and clustering similar documents. It then applies an algorithm based on the idea of formal concept analysis to identify schema variants from each cluster. Experimental evaluations in both stand-alone and distributed environments conducted on 4 real-world datasets—DBLP, Tweets, TV Series, and BestBuy—demonstrate that the proposed method JSVE is more fine-grained and efficient in extracting schema variants from a collection of JSON documents than the current state-of-the-art method ClustVariants. |
| format | Article |
| id | doaj-art-ebfd790be5aa4bec8701fb85b569b17b |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-ebfd790be5aa4bec8701fb85b569b17b2025-08-25T23:12:21ZengIEEEIEEE Access2169-35362025-01-011314551714552710.1109/ACCESS.2025.359965011127057JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept AnalysisZiyi Zhang0https://orcid.org/0009-0009-4518-7976Teng Lv1https://orcid.org/0000-0003-1862-5802School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, Anhui, ChinaSchool of Big Data and Artificial Intelligence, Anhui Xinhua University, Hefei, Anhui, ChinaThe JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data processing, data management, and data integration. Currently, most research focuses on identifying the global schema of JSON data sets, unable to describe the detailed structure of data, which increases the implementation and maintenance costs. To solve this problem, this paper proposes an improved method called JSON Schema Variant Extraction (JSVE), which extracts precise JSON schema variants from datasets based on clustering and formal concept analysis. As complex structures of large, heterogeneous JSON data cannot be analyzed directly, JSVE solves this limitation by flattening field names and clustering similar documents. It then applies an algorithm based on the idea of formal concept analysis to identify schema variants from each cluster. Experimental evaluations in both stand-alone and distributed environments conducted on 4 real-world datasets—DBLP, Tweets, TV Series, and BestBuy—demonstrate that the proposed method JSVE is more fine-grained and efficient in extracting schema variants from a collection of JSON documents than the current state-of-the-art method ClustVariants.https://ieeexplore.ieee.org/document/11127057/JSONschema variantschema extractionclusterformal concept analysis |
| spellingShingle | Ziyi Zhang Teng Lv JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis IEEE Access JSON schema variant schema extraction cluster formal concept analysis |
| title | JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis |
| title_full | JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis |
| title_fullStr | JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis |
| title_full_unstemmed | JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis |
| title_short | JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis |
| title_sort | jsve json schema variant extraction based on clustering and formal concept analysis |
| topic | JSON schema variant schema extraction cluster formal concept analysis |
| url | https://ieeexplore.ieee.org/document/11127057/ |
| work_keys_str_mv | AT ziyizhang jsvejsonschemavariantextractionbasedonclusteringandformalconceptanalysis AT tenglv jsvejsonschemavariantextractionbasedonclusteringandformalconceptanalysis |