Towards a Datatset of Digitalized Historical German VET and CVET Regulations

The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and...

Full description

Saved in:
Bibliographic Details
Main Authors: Thomas Reiser, Jens Dörpinghaus, Petra Steiner, Michael Tiemann
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/9/11/128
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and training) and CVET (continuing vocational education and training) regulations to support educational research. This dataset contains data from 1908 to the present and includes 2125 documents as PDF, 983 fully converted XML documents, and additional metadata for 7090 documents from the archive. We present an overview of the historical background and the challenges of processing different historical documents from three different federal states.
ISSN:2306-5729