JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis

The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data proces...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziyi Zhang, Teng Lv
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11127057/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849223107809640448
author Ziyi Zhang
Teng Lv
author_facet Ziyi Zhang
Teng Lv
author_sort Ziyi Zhang
collection DOAJ
description The JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data processing, data management, and data integration. Currently, most research focuses on identifying the global schema of JSON data sets, unable to describe the detailed structure of data, which increases the implementation and maintenance costs. To solve this problem, this paper proposes an improved method called JSON Schema Variant Extraction (JSVE), which extracts precise JSON schema variants from datasets based on clustering and formal concept analysis. As complex structures of large, heterogeneous JSON data cannot be analyzed directly, JSVE solves this limitation by flattening field names and clustering similar documents. It then applies an algorithm based on the idea of formal concept analysis to identify schema variants from each cluster. Experimental evaluations in both stand-alone and distributed environments conducted on 4 real-world datasets—DBLP, Tweets, TV Series, and BestBuy—demonstrate that the proposed method JSVE is more fine-grained and efficient in extracting schema variants from a collection of JSON documents than the current state-of-the-art method ClustVariants.
format Article
id doaj-art-ebfd790be5aa4bec8701fb85b569b17b
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-ebfd790be5aa4bec8701fb85b569b17b2025-08-25T23:12:21ZengIEEEIEEE Access2169-35362025-01-011314551714552710.1109/ACCESS.2025.359965011127057JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept AnalysisZiyi Zhang0https://orcid.org/0009-0009-4518-7976Teng Lv1https://orcid.org/0000-0003-1862-5802School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, Anhui, ChinaSchool of Big Data and Artificial Intelligence, Anhui Xinhua University, Hefei, Anhui, ChinaThe JSON data format is widely used in a variety of data representation and exchange scenarios due to its flexibility. JSON data is usually schemaless, which ensures the lightweight and flexible advantages of JSON. However, the lack of schema information brings some problems in effective data processing, data management, and data integration. Currently, most research focuses on identifying the global schema of JSON data sets, unable to describe the detailed structure of data, which increases the implementation and maintenance costs. To solve this problem, this paper proposes an improved method called JSON Schema Variant Extraction (JSVE), which extracts precise JSON schema variants from datasets based on clustering and formal concept analysis. As complex structures of large, heterogeneous JSON data cannot be analyzed directly, JSVE solves this limitation by flattening field names and clustering similar documents. It then applies an algorithm based on the idea of formal concept analysis to identify schema variants from each cluster. Experimental evaluations in both stand-alone and distributed environments conducted on 4 real-world datasets—DBLP, Tweets, TV Series, and BestBuy—demonstrate that the proposed method JSVE is more fine-grained and efficient in extracting schema variants from a collection of JSON documents than the current state-of-the-art method ClustVariants.https://ieeexplore.ieee.org/document/11127057/JSONschema variantschema extractionclusterformal concept analysis
spellingShingle Ziyi Zhang
Teng Lv
JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
IEEE Access
JSON
schema variant
schema extraction
cluster
formal concept analysis
title JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
title_full JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
title_fullStr JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
title_full_unstemmed JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
title_short JSVE: JSON Schema Variant Extraction Based on Clustering and Formal Concept Analysis
title_sort jsve json schema variant extraction based on clustering and formal concept analysis
topic JSON
schema variant
schema extraction
cluster
formal concept analysis
url https://ieeexplore.ieee.org/document/11127057/
work_keys_str_mv AT ziyizhang jsvejsonschemavariantextractionbasedonclusteringandformalconceptanalysis
AT tenglv jsvejsonschemavariantextractionbasedonclusteringandformalconceptanalysis