Simplifying Subject Indexing: A Python-Powered Approach in KBR, the National Library of Belgium
This paper details the National Library of Belgium’s (KBR) exploration of automating the subject indexing process for their extensive collection using Python scripts. The initial exploration involved creating a reference dataset and automating the classification process using MARCXML files. The focu...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Code4Lib
2024-10-01
|
| Series: | Code4Lib Journal |
| Online Access: | https://journal.code4lib.org/articles/18103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper details the National Library of Belgium’s (KBR) exploration of automating the subject indexing process for their extensive collection using Python scripts. The initial exploration involved creating a reference dataset and automating the classification process using MARCXML files. The focus is on demonstrating the practicality, adaptability, and user-friendliness of the Python-based solution. The authors introduce their unique approach, emphasizing the semantically significant words in subject determination. The paper outlines the Python workflow, from creating the reference dataset to generating enriched bibliographic records. Criteria for an optimal workflow, including ease of creation and maintenance of the dataset, transparency, and correctness of suggestions, are discussed. The paper highlights the promising results of the Python-powered approach, showcasing two specific scripts that create a reference dataset and automate subject indexing. The flexibility and user-friendliness of the Python solution are emphasized, making it a compelling choice for libraries seeking efficient and maintainable solutions for subject indexing projects. |
|---|---|
| ISSN: | 1940-5758 |