VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins

We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within t...

Full description

Saved in:
Bibliographic Details
Main Authors: Mikhail E. Ram, G. Manju
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11086585/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849393814914990080
author Mikhail E. Ram
G. Manju
author_facet Mikhail E. Ram
G. Manju
author_sort Mikhail E. Ram
collection DOAJ
description We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within the embedding space. This is achieved through a combination of text-specific augmentations tailored for the nuances of natural language and a mixup-based regularisation mechanism that promotes smoother representation learning. Unlike conventional contrastive learning methods that rely on large batch sizes and hard negative mining to achieve performance, VIOLET operates exclusively on positive pairs. This eliminates the need for complex sampling strategies and significantly reduces training overhead. A key strength of VIOLET is its ability to perform consistently and robustly even with smaller batch sizes, making it an appealing choice for training on limited computational resources. Empirical results on the Semantic Textual Similarity Benchmark (STS-B) demonstrate that VIOLET achieves correlation scores on par with or exceeding several state-of-the-art sentence embedding models. These findings underscore the method&#x2019;s effectiveness, scalability, and practical utility in a wide range of downstream natural language understanding tasks, particularly in settings where efficiency and stability are critical. Our implementation is provided at (<uri>https://github.com/mikhail-ram/VIOLET</uri>)
format Article
id doaj-art-ea07d2aca7ca479dba608c38dbdbcba7
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-ea07d2aca7ca479dba608c38dbdbcba72025-08-20T03:40:17ZengIEEEIEEE Access2169-35362025-01-011313631213631910.1109/ACCESS.2025.359097111086585VIOLET: Vectorized Invariance Optimization for Language Embeddings Using TwinsMikhail E. Ram0https://orcid.org/0009-0006-0140-9360G. Manju1https://orcid.org/0000-0002-3870-8210School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaWe present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within the embedding space. This is achieved through a combination of text-specific augmentations tailored for the nuances of natural language and a mixup-based regularisation mechanism that promotes smoother representation learning. Unlike conventional contrastive learning methods that rely on large batch sizes and hard negative mining to achieve performance, VIOLET operates exclusively on positive pairs. This eliminates the need for complex sampling strategies and significantly reduces training overhead. A key strength of VIOLET is its ability to perform consistently and robustly even with smaller batch sizes, making it an appealing choice for training on limited computational resources. Empirical results on the Semantic Textual Similarity Benchmark (STS-B) demonstrate that VIOLET achieves correlation scores on par with or exceeding several state-of-the-art sentence embedding models. These findings underscore the method&#x2019;s effectiveness, scalability, and practical utility in a wide range of downstream natural language understanding tasks, particularly in settings where efficiency and stability are critical. Our implementation is provided at (<uri>https://github.com/mikhail-ram/VIOLET</uri>)https://ieeexplore.ieee.org/document/11086585/Sentence embeddingsbarlow twinsmixup regularisationredundancy reductionBERTself-supervised learning
spellingShingle Mikhail E. Ram
G. Manju
VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
IEEE Access
Sentence embeddings
barlow twins
mixup regularisation
redundancy reduction
BERT
self-supervised learning
title VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_full VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_fullStr VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_full_unstemmed VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_short VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_sort violet vectorized invariance optimization for language embeddings using twins
topic Sentence embeddings
barlow twins
mixup regularisation
redundancy reduction
BERT
self-supervised learning
url https://ieeexplore.ieee.org/document/11086585/
work_keys_str_mv AT mikhaileram violetvectorizedinvarianceoptimizationforlanguageembeddingsusingtwins
AT gmanju violetvectorizedinvarianceoptimizationforlanguageembeddingsusingtwins