Neither Corpus Nor Edition: Building a Pipeline to Make Data Analysis Possible on Medieval Arabic Commentary Traditions

We have built a suite of tools in Python to proficiently analyze text reuse and intertextuality for a specific kind of set of medieval Arabic texts (commentaries) available in print. We take these printed editions, scan them, pre-process the images, give it to an OCR engine, clean the results, and s...

Full description

Saved in:
Bibliographic Details
Main Authors: Cornelis van Lit, Dirk Roorda
Format: Article
Language:English
Published: Department of Languages, Literatures, and Cultures at McGill University 2024-06-01
Series:Journal of Cultural Analytics
Online Access:https://doi.org/10.22148/001c.116372
Tags: Add Tag
No Tags, Be the first to tag this record!