SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment

Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To...

Full description

Saved in:
Bibliographic Details
Main Authors: Jeevan Chapagain, Arun Balajiee Lekshmi Narayanan, Kamil Akhuseyinoglu, Peter Brusilovsky, Vasile Rus
Format: Article
Language:English
Published: LibraryPress@UF 2025-05-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Online Access:https://journals.flvc.org/FLAIRS/article/view/138727
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850272052765786112
author Jeevan Chapagain
Arun Balajiee Lekshmi Narayanan
Kamil Akhuseyinoglu
Peter Brusilovsky
Vasile Rus
author_facet Jeevan Chapagain
Arun Balajiee Lekshmi Narayanan
Kamil Akhuseyinoglu
Peter Brusilovsky
Vasile Rus
author_sort Jeevan Chapagain
collection DOAJ
description Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To facilitate the development of Artificial Intelligence (AI) and Machine Learning models for automated assessment of learners' self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which consists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic similarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide performance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial representations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming.
format Article
id doaj-art-ea723b035f17441ba5fa16d2affe9659
institution OA Journals
issn 2334-0754
2334-0762
language English
publishDate 2025-05-01
publisher LibraryPress@UF
record_format Article
series Proceedings of the International Florida Artificial Intelligence Research Society Conference
spelling doaj-art-ea723b035f17441ba5fa16d2affe96592025-08-20T01:51:58ZengLibraryPress@UFProceedings of the International Florida Artificial Intelligence Research Society Conference2334-07542334-07622025-05-0138110.32473/flairs.38.1.138727SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated AssessmentJeevan Chapagain0https://orcid.org/0009-0009-7185-0815Arun Balajiee Lekshmi Narayananhttps://orcid.org/0000-0002-7735-5008Kamil Akhuseyinogluhttps://orcid.org/0000-0002-7761-9755Peter BrusilovskyVasile RusUniversity of Memphis Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To facilitate the development of Artificial Intelligence (AI) and Machine Learning models for automated assessment of learners' self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which consists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic similarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide performance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial representations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming. https://journals.flvc.org/FLAIRS/article/view/138727
spellingShingle Jeevan Chapagain
Arun Balajiee Lekshmi Narayanan
Kamil Akhuseyinoglu
Peter Brusilovsky
Vasile Rus
SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
Proceedings of the International Florida Artificial Intelligence Research Society Conference
title SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
title_full SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
title_fullStr SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
title_full_unstemmed SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
title_short SelfCode 2.0: An Annotated Corpus of Student and Expert Line-by-Line Explanations of Code Examples for Automated Assessment
title_sort selfcode 2 0 an annotated corpus of student and expert line by line explanations of code examples for automated assessment
url https://journals.flvc.org/FLAIRS/article/view/138727
work_keys_str_mv AT jeevanchapagain selfcode20anannotatedcorpusofstudentandexpertlinebylineexplanationsofcodeexamplesforautomatedassessment
AT arunbalajieelekshminarayanan selfcode20anannotatedcorpusofstudentandexpertlinebylineexplanationsofcodeexamplesforautomatedassessment
AT kamilakhuseyinoglu selfcode20anannotatedcorpusofstudentandexpertlinebylineexplanationsofcodeexamplesforautomatedassessment
AT peterbrusilovsky selfcode20anannotatedcorpusofstudentandexpertlinebylineexplanationsofcodeexamplesforautomatedassessment
AT vasilerus selfcode20anannotatedcorpusofstudentandexpertlinebylineexplanationsofcodeexamplesforautomatedassessment