Information-Theoretical Analysis of a Transformer-Based Generative AI Model

Large Language models have shown a remarkable ability to “converse” with humans in a natural language across myriad topics. Despite the proliferation of these models, a deep understanding of how they work under the hood remains elusive. The core of these Generative AI models is composed of layers of...

Full description

Saved in:
Bibliographic Details
Main Authors: Manas Deb, Tokunbo Ogunfunmi
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/27/6/589
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432610598551552
author Manas Deb
Tokunbo Ogunfunmi
author_facet Manas Deb
Tokunbo Ogunfunmi
author_sort Manas Deb
collection DOAJ
description Large Language models have shown a remarkable ability to “converse” with humans in a natural language across myriad topics. Despite the proliferation of these models, a deep understanding of how they work under the hood remains elusive. The core of these Generative AI models is composed of layers of neural networks that employ the Transformer architecture. This architecture learns from large amounts of training data and creates new content in response to user input. In this study, we analyze the internals of the Transformer using Information Theory. To quantify the amount of information passing through a layer, we view it as an information transmission channel and compute the capacity of the channel. The highlight of our study is that, using Information-Theoretical tools, we develop techniques to visualize on an Information plane how the Transformer encodes the relationship between words in sentences while these words are projected into a high-dimensional vector space. We use Information Geometry to analyze the high-dimensional vectors in the Transformer layer and infer relationships between words based on the length of the geodesic connecting these vector distributions on a Riemannian manifold. Our tools reveal more information about these relationships than attention scores. In this study, we also show how Information-Theoretic analysis can help in troubleshooting learning problems in the Transformer layers.
format Article
id doaj-art-f36e3e6642254eb3a6ef73e69eeedb31
institution Kabale University
issn 1099-4300
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj-art-f36e3e6642254eb3a6ef73e69eeedb312025-08-20T03:27:18ZengMDPI AGEntropy1099-43002025-05-0127658910.3390/e27060589Information-Theoretical Analysis of a Transformer-Based Generative AI ModelManas Deb0Tokunbo Ogunfunmi1Department of Electrical and Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USADepartment of Electrical and Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USALarge Language models have shown a remarkable ability to “converse” with humans in a natural language across myriad topics. Despite the proliferation of these models, a deep understanding of how they work under the hood remains elusive. The core of these Generative AI models is composed of layers of neural networks that employ the Transformer architecture. This architecture learns from large amounts of training data and creates new content in response to user input. In this study, we analyze the internals of the Transformer using Information Theory. To quantify the amount of information passing through a layer, we view it as an information transmission channel and compute the capacity of the channel. The highlight of our study is that, using Information-Theoretical tools, we develop techniques to visualize on an Information plane how the Transformer encodes the relationship between words in sentences while these words are projected into a high-dimensional vector space. We use Information Geometry to analyze the high-dimensional vectors in the Transformer layer and infer relationships between words based on the length of the geodesic connecting these vector distributions on a Riemannian manifold. Our tools reveal more information about these relationships than attention scores. In this study, we also show how Information-Theoretic analysis can help in troubleshooting learning problems in the Transformer layers.https://www.mdpi.com/1099-4300/27/6/589machine learninggenerative AItransformerinformation theorymutual information estimationinformation geometry
spellingShingle Manas Deb
Tokunbo Ogunfunmi
Information-Theoretical Analysis of a Transformer-Based Generative AI Model
Entropy
machine learning
generative AI
transformer
information theory
mutual information estimation
information geometry
title Information-Theoretical Analysis of a Transformer-Based Generative AI Model
title_full Information-Theoretical Analysis of a Transformer-Based Generative AI Model
title_fullStr Information-Theoretical Analysis of a Transformer-Based Generative AI Model
title_full_unstemmed Information-Theoretical Analysis of a Transformer-Based Generative AI Model
title_short Information-Theoretical Analysis of a Transformer-Based Generative AI Model
title_sort information theoretical analysis of a transformer based generative ai model
topic machine learning
generative AI
transformer
information theory
mutual information estimation
information geometry
url https://www.mdpi.com/1099-4300/27/6/589
work_keys_str_mv AT manasdeb informationtheoreticalanalysisofatransformerbasedgenerativeaimodel
AT tokunboogunfunmi informationtheoreticalanalysisofatransformerbasedgenerativeaimodel