Simplifying Subject Indexing: A Python-Powered Approach in KBR, the National Library of Belgium

This paper details the National Library of Belgium’s (KBR) exploration of automating the subject indexing process for their extensive collection using Python scripts. The initial exploration involved creating a reference dataset and automating the classification process using MARCXML files. The focu...

Full description

Saved in:
Bibliographic Details
Main Author: Hannes Lowagie and Julie Van Woensel
Format: Article
Language:English
Published: Code4Lib 2024-10-01
Series:Code4Lib Journal
Online Access:https://journal.code4lib.org/articles/18103
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper details the National Library of Belgium’s (KBR) exploration of automating the subject indexing process for their extensive collection using Python scripts. The initial exploration involved creating a reference dataset and automating the classification process using MARCXML files. The focus is on demonstrating the practicality, adaptability, and user-friendliness of the Python-based solution. The authors introduce their unique approach, emphasizing the semantically significant words in subject determination. The paper outlines the Python workflow, from creating the reference dataset to generating enriched bibliographic records. Criteria for an optimal workflow, including ease of creation and maintenance of the dataset, transparency, and correctness of suggestions, are discussed. The paper highlights the promising results of the Python-powered approach, showcasing two specific scripts that create a reference dataset and automate subject indexing. The flexibility and user-friendliness of the Python solution are emphasized, making it a compelling choice for libraries seeking efficient and maintainable solutions for subject indexing projects.
ISSN:1940-5758