Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference

Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiwen Chen, Huayu Li, Rahul Amin, Abolfazl Razi
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Transactions on Machine Learning in Communications and Networking
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10580972/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850036107946754048
author Xiwen Chen
Huayu Li
Rahul Amin
Abolfazl Razi
author_facet Xiwen Chen
Huayu Li
Rahul Amin
Abolfazl Razi
author_sort Xiwen Chen
collection DOAJ
description Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).
format Article
id doaj-art-d82e467198cb40ada7d88895b814cbb4
institution DOAJ
issn 2831-316X
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Transactions on Machine Learning in Communications and Networking
spelling doaj-art-d82e467198cb40ada7d88895b814cbb42025-08-20T02:57:17ZengIEEEIEEE Transactions on Machine Learning in Communications and Networking2831-316X2024-01-0121341135610.1109/TMLCN.2024.342190710580972Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP InferenceXiwen Chen0https://orcid.org/0000-0002-8006-4383Huayu Li1https://orcid.org/0000-0001-9143-4741Rahul Amin2Abolfazl Razi3https://orcid.org/0000-0002-3330-6132School of Computing, Clemson University, Clemson, SC, USADepartment of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, USATactical Networks Group, MIT Lincoln Laboratory, Lexington, MA, USASchool of Computing, Clemson University, Clemson, SC, USADeterminantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).https://ieeexplore.ieee.org/document/10580972/Determinantal point processdata diversificationdistributed learningdistributed sources
spellingShingle Xiwen Chen
Huayu Li
Rahul Amin
Abolfazl Razi
Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
IEEE Transactions on Machine Learning in Communications and Networking
Determinantal point process
data diversification
distributed learning
distributed sources
title Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_full Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_fullStr Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_full_unstemmed Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_short Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_sort learning on bandwidth constrained multi source data with mimo inspired dpp map inference
topic Determinantal point process
data diversification
distributed learning
distributed sources
url https://ieeexplore.ieee.org/document/10580972/
work_keys_str_mv AT xiwenchen learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference
AT huayuli learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference
AT rahulamin learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference
AT abolfazlrazi learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference