Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals

Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, withou...

Full description

Saved in:
Bibliographic Details
Main Author: Aerith Y. Netzer
Format: Article
Language:English
Published: Code4Lib 2025-04-01
Series:Code4Lib Journal
Online Access:https://journal.code4lib.org/articles/18368
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850184727832559616
author Aerith Y. Netzer
author_facet Aerith Y. Netzer
author_sort Aerith Y. Netzer
collection DOAJ
description Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality.
format Article
id doaj-art-2b812e5a758d42debb0f2935735d195b
institution OA Journals
issn 1940-5758
language English
publishDate 2025-04-01
publisher Code4Lib
record_format Article
series Code4Lib Journal
spelling doaj-art-2b812e5a758d42debb0f2935735d195b2025-08-20T02:16:56ZengCode4LibCode4Lib Journal1940-57582025-04-016018368Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly JournalsAerith Y. NetzerNorthwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality.https://journal.code4lib.org/articles/18368
spellingShingle Aerith Y. Netzer
Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
Code4Lib Journal
title Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
title_full Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
title_fullStr Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
title_full_unstemmed Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
title_short Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
title_sort large language models for machine readable citation data towards an automated metadata curation pipeline for scholarly journals
url https://journal.code4lib.org/articles/18368
work_keys_str_mv AT aerithynetzer largelanguagemodelsformachinereadablecitationdatatowardsanautomatedmetadatacurationpipelineforscholarlyjournals