Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology

The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Fereshteh R. Dastjerdi, Liming Cai
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/13/2185
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319851647041536
author Fereshteh R. Dastjerdi
Liming Cai
author_facet Fereshteh R. Dastjerdi
Liming Cai
author_sort Fereshteh R. Dastjerdi
collection DOAJ
description The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the <i>k</i>-dependency Bayesian classifier (KDB), where one feature can depend on up to <i>k</i> other features (for a given <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>k</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>). In such cases, however, the selection of the <i>k</i> parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of <i>k</i>-tree Augmented Naïve Bayes (<i>k</i>-TAN) is introduced to augment Naïve Bayesian classifiers with <i>k</i>-tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, <i>k</i>-tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning <i>k</i>-tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning <i>k</i>-tree is NP-hard for fixed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>k</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>, this work shows that the approximation problem can be solved in time <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mrow><mi>k</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> if the spanning <i>k</i>-tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with <i>k</i>-tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information.
format Article
id doaj-art-5317493ffbcb4d01a955f0378e9173cc
institution Kabale University
issn 2227-7390
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-5317493ffbcb4d01a955f0378e9173cc2025-08-20T03:50:17ZengMDPI AGMathematics2227-73902025-07-011313218510.3390/math13132185Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree TopologyFereshteh R. Dastjerdi0Liming Cai1School of Computing, University of Georgia, Athens, GA 30602, USASchool of Computing, University of Georgia, Athens, GA 30602, USAThe Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the <i>k</i>-dependency Bayesian classifier (KDB), where one feature can depend on up to <i>k</i> other features (for a given <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>k</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>). In such cases, however, the selection of the <i>k</i> parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of <i>k</i>-tree Augmented Naïve Bayes (<i>k</i>-TAN) is introduced to augment Naïve Bayesian classifiers with <i>k</i>-tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, <i>k</i>-tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning <i>k</i>-tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning <i>k</i>-tree is NP-hard for fixed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>k</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>, this work shows that the approximation problem can be solved in time <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msup><mi>n</mi><mrow><mi>k</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> if the spanning <i>k</i>-tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with <i>k</i>-tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information.https://www.mdpi.com/2227-7390/13/13/2185Bayesian networksNaïve Bayes augmentationclassificationmutual informationKL-divergencetreewidth
spellingShingle Fereshteh R. Dastjerdi
Liming Cai
Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
Mathematics
Bayesian networks
Naïve Bayes augmentation
classification
mutual information
KL-divergence
treewidth
title Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
title_full Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
title_fullStr Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
title_full_unstemmed Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
title_short Augmenting Naïve Bayes Classifiers with <i>k</i>-Tree Topology
title_sort augmenting naive bayes classifiers with i k i tree topology
topic Bayesian networks
Naïve Bayes augmentation
classification
mutual information
KL-divergence
treewidth
url https://www.mdpi.com/2227-7390/13/13/2185
work_keys_str_mv AT fereshtehrdastjerdi augmentingnaivebayesclassifierswithikitreetopology
AT limingcai augmentingnaivebayesclassifierswithikitreetopology