Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program

The increasing importance of data science in today’s world highlights the need to prepare students for the complexities of real-world data. This paper presents insights and findings from 15 years of teaching Data Mining and Business Intelligence in a Computer Science Master’s program, where a key co...

Full description

Saved in:
Bibliographic Details
Main Author: Iraklis Varlamis
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Education Sciences
Subjects:
Online Access:https://www.mdpi.com/2227-7102/15/4/500
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849715072687931392
author Iraklis Varlamis
author_facet Iraklis Varlamis
author_sort Iraklis Varlamis
collection DOAJ
description The increasing importance of data science in today’s world highlights the need to prepare students for the complexities of real-world data. This paper presents insights and findings from 15 years of teaching Data Mining and Business Intelligence in a Computer Science Master’s program, where a key component of the course is a semester-long assignment involving publicly available, messy, and often incomplete datasets. These datasets include examples such as publicly accessible datasets on accidents or fines from data.gov.uk, data from data contest platforms like Kaggle, and house rental data from platforms like Airbnb. Through these assignments, students are tasked with not only applying algorithmic tools but also addressing challenges like missing information, noisy inputs, and inconsistencies. They also learn the importance of finding and integrating supplementary open data sources to enhance the value and depth of their analyses. The primary objective of this approach is to enhance students’ problem-solving abilities by engaging them in complex, real-world data scenarios where they must navigate and resolve issues related to data quality and completeness. This approach cultivates critical skills such as data wrangling, preprocessing, and the extraction of meaningful insights, along with the ability to understand and articulate the business value of the data. Working hypotheses, such as the impact of data quality on analysis outcomes, are explored, and the paper demonstrates how addressing these challenges improves students’ decision-making processes in data-driven tasks. By engaging with real-world datasets, students develop resilience, adaptability, and problem-solving abilities, which are essential for navigating the complexities of data science in professional settings. This paper highlights the educational benefits of using messy data to bridge the gap between theoretical knowledge and real-world application while also demonstrating how this method explicitly improves students’ problem-solving and critical thinking skills in the context of data science.
format Article
id doaj-art-b3da9351bbdc43b2bc4c7e8eed8c3352
institution DOAJ
issn 2227-7102
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Education Sciences
spelling doaj-art-b3da9351bbdc43b2bc4c7e8eed8c33522025-08-20T03:13:30ZengMDPI AGEducation Sciences2227-71022025-04-0115450010.3390/educsci15040500Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s ProgramIraklis Varlamis0Department of Informatics and Telematics, Harokopio University of Athens, 17778 Athens, GreeceThe increasing importance of data science in today’s world highlights the need to prepare students for the complexities of real-world data. This paper presents insights and findings from 15 years of teaching Data Mining and Business Intelligence in a Computer Science Master’s program, where a key component of the course is a semester-long assignment involving publicly available, messy, and often incomplete datasets. These datasets include examples such as publicly accessible datasets on accidents or fines from data.gov.uk, data from data contest platforms like Kaggle, and house rental data from platforms like Airbnb. Through these assignments, students are tasked with not only applying algorithmic tools but also addressing challenges like missing information, noisy inputs, and inconsistencies. They also learn the importance of finding and integrating supplementary open data sources to enhance the value and depth of their analyses. The primary objective of this approach is to enhance students’ problem-solving abilities by engaging them in complex, real-world data scenarios where they must navigate and resolve issues related to data quality and completeness. This approach cultivates critical skills such as data wrangling, preprocessing, and the extraction of meaningful insights, along with the ability to understand and articulate the business value of the data. Working hypotheses, such as the impact of data quality on analysis outcomes, are explored, and the paper demonstrates how addressing these challenges improves students’ decision-making processes in data-driven tasks. By engaging with real-world datasets, students develop resilience, adaptability, and problem-solving abilities, which are essential for navigating the complexities of data science in professional settings. This paper highlights the educational benefits of using messy data to bridge the gap between theoretical knowledge and real-world application while also demonstrating how this method explicitly improves students’ problem-solving and critical thinking skills in the context of data science.https://www.mdpi.com/2227-7102/15/4/500data wranglingreal-world datasetsdata science educationdata preprocessingproblem-solving skills
spellingShingle Iraklis Varlamis
Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
Education Sciences
data wrangling
real-world datasets
data science education
data preprocessing
problem-solving skills
title Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
title_full Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
title_fullStr Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
title_full_unstemmed Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
title_short Messy Data in Education: Enhancing Data Science Literacy Through Real-World Datasets in a Master’s Program
title_sort messy data in education enhancing data science literacy through real world datasets in a master s program
topic data wrangling
real-world datasets
data science education
data preprocessing
problem-solving skills
url https://www.mdpi.com/2227-7102/15/4/500
work_keys_str_mv AT iraklisvarlamis messydataineducationenhancingdatascienceliteracythroughrealworlddatasetsinamastersprogram