Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference
Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Transactions on Machine Learning in Communications and Networking |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10580972/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850036107946754048 |
|---|---|
| author | Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi |
| author_facet | Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi |
| author_sort | Xiwen Chen |
| collection | DOAJ |
| description | Determinantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC). |
| format | Article |
| id | doaj-art-d82e467198cb40ada7d88895b814cbb4 |
| institution | DOAJ |
| issn | 2831-316X |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Transactions on Machine Learning in Communications and Networking |
| spelling | doaj-art-d82e467198cb40ada7d88895b814cbb42025-08-20T02:57:17ZengIEEEIEEE Transactions on Machine Learning in Communications and Networking2831-316X2024-01-0121341135610.1109/TMLCN.2024.342190710580972Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP InferenceXiwen Chen0https://orcid.org/0000-0002-8006-4383Huayu Li1https://orcid.org/0000-0001-9143-4741Rahul Amin2Abolfazl Razi3https://orcid.org/0000-0002-3330-6132School of Computing, Clemson University, Clemson, SC, USADepartment of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, USATactical Networks Group, MIT Lincoln Laboratory, Lexington, MA, USASchool of Computing, Clemson University, Clemson, SC, USADeterminantal Point Process (DPP) is a powerful technique to enhance data diversity by promoting the repulsion of similar elements in the selected samples. Particularly, DPP-based Maximum A Posteriori (MAP) inference is used to identify subsets with the highest diversity. However, a commonly adopted presumption of all data samples being available at one point hinders its applicability to real-world scenarios where data samples are distributed across distinct sources with intermittent and bandwidth-limited connections. This paper proposes a distributed version of DPP inference to enhance multi-source data diversification under limited communication budgets. First, we convert the lower bound of the diversity-maximized distributed sample selection from matrix determinant optimization to a simpler form of the sum of individual terms. Next, a determinant-preserved sparse representation of selected samples is formed by the sink as a surrogate for collected samples and sent back to sources as lightweight messages to eliminate the need for raw data exchange. Our approach is inspired by the channel orthogonalization process of Multiple-Input Multiple-Output (MIMO) systems based on the Channel State Information (CSI). Extensive experiments verify the superiority of our scalable method over the most commonly used data selection methods, including GreeDi, Greedymax, random selection, and stratified sampling by a substantial gain of at least 12% reduction in Relative Diversity Error (RDE). This enhanced diversity translates to a substantial improvement in the performance of various downstream learning tasks, including multi-level classification (2%-4% gain in accuracy), object detection (2% gain in mAP), and multiple-instance learning (1.3% gain in AUC).https://ieeexplore.ieee.org/document/10580972/Determinantal point processdata diversificationdistributed learningdistributed sources |
| spellingShingle | Xiwen Chen Huayu Li Rahul Amin Abolfazl Razi Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference IEEE Transactions on Machine Learning in Communications and Networking Determinantal point process data diversification distributed learning distributed sources |
| title | Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference |
| title_full | Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference |
| title_fullStr | Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference |
| title_full_unstemmed | Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference |
| title_short | Learning on Bandwidth Constrained Multi-Source Data With MIMO-Inspired DPP MAP Inference |
| title_sort | learning on bandwidth constrained multi source data with mimo inspired dpp map inference |
| topic | Determinantal point process data diversification distributed learning distributed sources |
| url | https://ieeexplore.ieee.org/document/10580972/ |
| work_keys_str_mv | AT xiwenchen learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT huayuli learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT rahulamin learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference AT abolfazlrazi learningonbandwidthconstrainedmultisourcedatawithmimoinspireddppmapinference |