Domain-specific text embedding model for accelerator physics

Accelerator physics presents unique challenges for natural language processing (NLP) due to its specialized terminology and complex concepts. A key component in overcoming these challenges is the development of robust text embedding models that transform textual data into dense vector representation...

Full description

Saved in:
Bibliographic Details
Main Authors: Thorsten Hellert, João Montenegro, Marco Venturini, Andrea Pollastro
Format: Article
Language:English
Published: American Physical Society 2025-04-01
Series:Physical Review Accelerators and Beams
Online Access:http://doi.org/10.1103/PhysRevAccelBeams.28.044601
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accelerator physics presents unique challenges for natural language processing (NLP) due to its specialized terminology and complex concepts. A key component in overcoming these challenges is the development of robust text embedding models that transform textual data into dense vector representations, facilitating efficient information retrieval and semantic understanding. In this work, we introduce AccPhysBERT, a sentence embedding model fine-tuned specifically for accelerator physics. Our model demonstrates superior performance across a range of downstream NLP tasks, surpassing existing models in capturing the domain-specific nuances of the field. We further showcase its practical applications, including semantic paper-reviewer matching and integration into retrieval-augmented generation systems, highlighting its potential to enhance information retrieval and knowledge discovery in accelerator physics.
ISSN:2469-9888