Deep Clustering of Student Code Strategies Using Multi-View Code Representation (CMVAE)

In programming education, it is common for students to submit solutions to algorithmic problems that implement the same functionality but are not labeled, making it difficult to identify which codes employ similar strategies. Students approach problem-solving in diverse ways, and each problem can be...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhengting Tang, Shizhou Wang, Liangyu Chen
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3462
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In programming education, it is common for students to submit solutions to algorithmic problems that implement the same functionality but are not labeled, making it difficult to identify which codes employ similar strategies. Students approach problem-solving in diverse ways, and each problem can be solved using multiple programming strategies. Existing code representation methods typically rely on labeled datasets and task-specific training, limiting their generalizability. To address this, this paper proposes CMVAE, a deep clustering model that leverages multi-view representations for group student code based on problem-solving strategies. The model captures structural features by transforming code into tree graphs and extracting centrality measures, while CodeBERT provides semantic embeddings. Through a joint optimization of reconstruction loss and clustering loss, the model effectively integrates multiple code representations. Experimental results on C# and Python datasets show that CMVAE outperforms traditional and deep clustering baselines, producing more compact and well-separated clusters. CMVAE can assist educators in analyzing student approaches, providing targeted feedback for optimization, and enhancing programming pedagogy.
ISSN:2076-3417