Regression for Astronomical Data with Realistic Distributions, Errors, and Nonlinearity
We have developed a new regression technique, the maximum likelihood (ML)–based method and its variant, the Kolmogorov–Smirnov (KS) test–based method, designed to obtain unbiased regression results from typical astronomical data. A normalizing flow model is employed to automatically estimate the uno...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | The Astronomical Journal |
| Subjects: | |
| Online Access: | https://doi.org/10.3847/1538-3881/add891 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | We have developed a new regression technique, the maximum likelihood (ML)–based method and its variant, the Kolmogorov–Smirnov (KS) test–based method, designed to obtain unbiased regression results from typical astronomical data. A normalizing flow model is employed to automatically estimate the unobservable intrinsic distribution of the independent variable and the unobservable correlation between uncertainty level and intrinsic value of both independent and dependent variables from the observed data points in a variational-inference-based empirical Bayes approach. By incorporating these estimated distributions, our method comprehensively accounts for the uncertainties associated with both independent and dependent variables. Our test on both mock data and real astronomical data from PHANGS-ALMA and PHANGS-JWST demonstrates that, given a sufficiently large sample size (>1000), both the ML-based method and the KS-test-based method significantly outperform the existing widely used methods, particularly in cases of low signal-to-noise ratios. The KS-test-based method exhibits remarkable robustness against deviations from underlying assumptions, complex intrinsic distributions, varying correlations between uncertainty levels and intrinsic values, inaccuracies in uncertainty estimations, outliers, and saturation effects. For sample sizes between 300 and 1000, the ML-based method yields the best performance. In the low-data regime (<300), the ML-based method maintains comparable performance to other state-of-the-art methods. A GPU-compatible Python implementation of our methods, nicknamed “ raddest ,” will be made publicly available upon acceptance of this paper. |
|---|---|
| ISSN: | 1538-3881 |