JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data proces...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11127057/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data processing, data management, and data integration. Currently, most research focuses on identifying the global schema of JSON data sets, unable to describe the detailed structure of data, which increases the implementation and maintenance costs. To solve this problem, this paper proposes an improved method called JSON Schema Variant Extraction (JSVE), which extracts precise JSON schema variants from datasets based on clustering and formal concept analysis. As complex structures of large, heterogeneous JSON data cannot be analyzed directly, JSVE solves this limitation by flattening field names and clustering similar documents. It then applies an algorithm based on the idea of formal concept analysis to identify schema variants from each cluster. Experimental evaluations in both stand-alone and distributed environments conducted on 4 real-world datasets—DBLP, Tweets, TV Series, and BestBuy—demonstrate that the proposed method JSVE is more fine-grained and efficient in extracting schema variants from a collection of JSON documents than the current state-of-the-art method ClustVariants. |
|---|---|
| ISSN: | 2169-3536 |