A KeyBERT-Enhanced Pipeline for Electronic Information Curriculum Knowledge Graphs: Design, Evaluation, and Ontology Alignment

This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly ev...

Full description

Saved in:
Bibliographic Details
Main Authors: Guanghe Zhuang, Xiang Lu
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/7/580
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs often overlook multi-word concepts and more nuanced semantic relationships. To address this gap, this paper presents a KeyBERT-enhanced method for constructing a knowledge graph of the electronic information curriculum system. Utilizing teaching plans, syllabi, and approximately 500,000 words of course materials from 17 courses, we first extracted 500 knowledge points via the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm to build a baseline course–knowledge matrix and visualize the preliminary graph using Graph Convolutional Networks (GCN) and Neo4j. We then applied KeyBERT to extract about 1000 knowledge points—approximately 65% of extracted terms were multi-word phrases—and augment the graph with co-occurrence and semantic-similarity edges. Comparative experiments demonstrate a ~20% increase in non-zero matrix coverage and a ~40% boost in edge count (from 5100 to 7100), significantly enhancing graph connectivity. Moreover, we performed sensitivity analysis on extraction thresholds (co-occurrence ≥ 5, similarity ≥ 0.7), revealing that (5, 0.7) maximizes the F1-score at 0.83. Hyperparameter ablation over n-gram ranges [(1,1),(1,2),(1,3)] and top_n [5, 10, 15] identifies (1,3) + top_n = 10 as optimal (Precision = 0.86, Recall = 0.81, F1 = 0.83). Finally, GCN downstream tests show that, despite higher sparsity (KeyBERT 64% vs. TF-IDF 40%), KeyBERT features achieve Accuracy = 0.78 and F1 = 0.75, outperforming TF-IDF’s 0.66/0.69. This approach offers a novel, rigorously evaluated solution for optimizing the electronic information curriculum system and can be extended through terminology standardization or larger data integration.
ISSN:2078-2489