VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins

We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mikhail E. Ram, G. Manju
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Sentence embeddings barlow twins mixup regularisation redundancy reduction BERT self-supervised learning
Online Access:	https://ieeexplore.ieee.org/document/11086585/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849393814914990080
author	Mikhail E. Ram G. Manju
author_facet	Mikhail E. Ram G. Manju
author_sort	Mikhail E. Ram
collection	DOAJ
description	We present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within the embedding space. This is achieved through a combination of text-specific augmentations tailored for the nuances of natural language and a mixup-based regularisation mechanism that promotes smoother representation learning. Unlike conventional contrastive learning methods that rely on large batch sizes and hard negative mining to achieve performance, VIOLET operates exclusively on positive pairs. This eliminates the need for complex sampling strategies and significantly reduces training overhead. A key strength of VIOLET is its ability to perform consistently and robustly even with smaller batch sizes, making it an appealing choice for training on limited computational resources. Empirical results on the Semantic Textual Similarity Benchmark (STS-B) demonstrate that VIOLET achieves correlation scores on par with or exceeding several state-of-the-art sentence embedding models. These findings underscore the method’s effectiveness, scalability, and practical utility in a wide range of downstream natural language understanding tasks, particularly in settings where efficiency and stability are critical. Our implementation is provided at (<uri>https://github.com/mikhail-ram/VIOLET</uri>)
format	Article
id	doaj-art-ea07d2aca7ca479dba608c38dbdbcba7
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-ea07d2aca7ca479dba608c38dbdbcba72025-08-20T03:40:17ZengIEEEIEEE Access2169-35362025-01-011313631213631910.1109/ACCESS.2025.359097111086585VIOLET: Vectorized Invariance Optimization for Language Embeddings Using TwinsMikhail E. Ram0https://orcid.org/0009-0006-0140-9360G. Manju1https://orcid.org/0000-0002-3870-8210School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, IndiaWe present VIOLET, a novel positive pair-based information maximisation strategy for fine-tuning BERT to generate robust, invariant, and semantically meaningful sentence embeddings. VIOLET extends the Barlow Twins framework by addressing both redundancy reduction and invariance preservation within the embedding space. This is achieved through a combination of text-specific augmentations tailored for the nuances of natural language and a mixup-based regularisation mechanism that promotes smoother representation learning. Unlike conventional contrastive learning methods that rely on large batch sizes and hard negative mining to achieve performance, VIOLET operates exclusively on positive pairs. This eliminates the need for complex sampling strategies and significantly reduces training overhead. A key strength of VIOLET is its ability to perform consistently and robustly even with smaller batch sizes, making it an appealing choice for training on limited computational resources. Empirical results on the Semantic Textual Similarity Benchmark (STS-B) demonstrate that VIOLET achieves correlation scores on par with or exceeding several state-of-the-art sentence embedding models. These findings underscore the method’s effectiveness, scalability, and practical utility in a wide range of downstream natural language understanding tasks, particularly in settings where efficiency and stability are critical. Our implementation is provided at (<uri>https://github.com/mikhail-ram/VIOLET</uri>)https://ieeexplore.ieee.org/document/11086585/Sentence embeddingsbarlow twinsmixup regularisationredundancy reductionBERTself-supervised learning
spellingShingle	Mikhail E. Ram G. Manju VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins IEEE Access Sentence embeddings barlow twins mixup regularisation redundancy reduction BERT self-supervised learning
title	VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_full	VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_fullStr	VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_full_unstemmed	VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_short	VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins
title_sort	violet vectorized invariance optimization for language embeddings using twins
topic	Sentence embeddings barlow twins mixup regularisation redundancy reduction BERT self-supervised learning
url	https://ieeexplore.ieee.org/document/11086585/
work_keys_str_mv	AT mikhaileram violetvectorizedinvarianceoptimizationforlanguageembeddingsusingtwins AT gmanju violetvectorizedinvarianceoptimizationforlanguageembeddingsusingtwins

VIOLET: Vectorized Invariance Optimization for Language Embeddings Using Twins

Similar Items