Adaptive CoCoLasso for High-Dimensional Measurement Error Models

A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progr...

Full description

Saved in:
Bibliographic Details
Main Author: Qin Yu
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/27/2/97
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progress in high-dimensional regression with contaminated covariates, methods that achieve an effective trade-off among prediction accuracy, feature selection, and computational efficiency remain significantly underexplored. We introduce adaptive convex conditioned Lasso (Adaptive CoCoLasso), offering a new approach that can handle high-dimensional linear models with error-prone measurements. This estimator combines a projection onto the nearest positive semi-definite matrix with an adaptively weighted <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mo>ℓ</mo><mn>1</mn></msub></semantics></math></inline-formula> penalty. Theoretical guarantees are provided by establishing error bounds for the estimators. The results from the synthetic data analysis indicate that the Adaptive CoCoLasso performs strongly in prediction accuracy and mean squared error, particularly in scenarios involving both additive and multiplicative noise in measurements. While the Adaptive CoCoLasso estimator performs comparably or is slightly outperformed by certain methods, such as Hard, in reducing the number of incorrectly identified covariates, its strength lies in offering a more favorable trade-off between prediction accuracy and sparse modeling.
ISSN:1099-4300