Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals

Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, withou...

Full description

Saved in:
Bibliographic Details
Main Author: Aerith Y. Netzer
Format: Article
Language:English
Published: Code4Lib 2025-04-01
Series:Code4Lib Journal
Online Access:https://journal.code4lib.org/articles/18368
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality.
ISSN:1940-5758