Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference

Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiwen Chen, Huayu Li, Rahul Amin, Abolfazl Razi
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Transactions on Machine Learning in Communications and Networking
Subjects:	Determinantal point process data diversification distributed learning distributed sources
Online Access:	https://ieeexplore.ieee.org/document/10580972/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850036107946754048
author	Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi
author_facet	Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi
author_sort	Xiwen Chen
collection	DOAJ
description	Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).
format	Article
id	doaj-art-d82e467198cb40ada7d88895b814cbb4
institution	DOAJ
issn	2831-316X
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Transactions on Machine Learning in Communications and Networking
spelling	doaj-art-d82e467198cb40ada7d88895b814cbb42025-08-20T02:57:17ZengIEEEIEEE Transactions on Machine Learning in Communications and Networking2831-316X2024-01-0121341135610.1109/TMLCN.2024.342190710580972Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP InferenceXiwen Chen0https://orcid.org/0000-0002-8006-4383Huayu Li1https://orcid.org/0000-0001-9143-4741Rahul Amin2Abolfazl Razi3https://orcid.org/0000-0002-3330-6132School of Computing, Clemson University, Clemson, SC, USADepartment of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, USATactical Networks Group, MIT Lincoln Laboratory, Lexington, MA, USASchool of Computing, Clemson University, Clemson, SC, USADeterminantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).https://ieeexplore.ieee.org/document/10580972/Determinantal point processdata diversificationdistributed learningdistributed sources
spellingShingle	Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference IEEE Transactions on Machine Learning in Communications and Networking Determinantal point process data diversification distributed learning distributed sources
title	Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_full	Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_fullStr	Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_full_unstemmed	Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_short	Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
title_sort	learning on bandwidth constrained multi source data with mimo inspired dpp map inference
topic	Determinantal point process data diversification distributed learning distributed sources
url	https://ieeexplore.ieee.org/document/10580972/
work_keys_str_mv	AT xiwenchen learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT huayuli learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT rahulamin learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT abolfazlrazi learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference

Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference

Similar Items