Large Language Models for Machine-Readable Citation Data: Towards an Automated Metadata Curation Pipeline for Scholarly Journals
Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, withou...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Code4Lib
2025-04-01
|
| Series: | Code4Lib Journal |
| Online Access: | https://journal.code4lib.org/articles/18368 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Northwestern University spent far too much time and effort curating citation data by hand. Here, we show that large language models can be an efficient way to convert plain-text citations to BibTeX for use in machine-actionable metadata. Further, we prove that these models can be run locally, without cloud compute cost. With these tools, university-owned publishing operations can increase their operating efficiency which, when combined with human review, has no effect on quality. |
|---|---|
| ISSN: | 1940-5758 |