A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation

Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based are...

Full description

Saved in:
Bibliographic Details
Main Authors: Bo Zhang, Xuehong Chen, Xihong Cui, Miaogen Shen
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/7/1145
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849730236592160768
author Bo Zhang
Xuehong Chen
Xihong Cui
Miaogen Shen
author_facet Bo Zhang
Xuehong Chen
Xihong Cui
Miaogen Shen
author_sort Bo Zhang
collection DOAJ
description Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., bias-adjusted estimator, model-assisted difference estimator, model-assisted ratio estimator derived from confusion matrix), which combine the information of ground truth samples and the classification map, have been applied to provide more accurate area estimates and the uncertainty inference. These estimators work well for estimating areas in a region with sufficient ground truth samples, whereas they encounter challenges when estimating areas in multiple subregions where the samples are limited within each subregion. To overcome this limitation, we propose a novel Bias-Adjusted Estimator based on the Synthetic Confusion Matrix (BAESCM) for estimating land cover areas in subregions by downscaling the global sample information to the subregion scale. First, several clusters were generated from remote sensing data through the K-means method (with the number of clusters being much smaller than the number of subregions). Then, the cluster confusion matrix is estimated based on the samples in each cluster. Assuming that the classification error distribution within each cluster remains consistent across different subregions, the confusion matrix of the subregion can be synthesized by a weighted sum of the cluster confusion matrices, with the weights of the cluster abundances in the subregion. Finally, the classification bias at the subregion scale can be estimated based on the synthetic confusion matrix, and the area counted from the classification map is corrected accordingly. Moreover, we introduced a semi-empirical method for inferring the confidence intervals of the estimated areas, considering both the sampling variance due to sampling randomness and the downscaling variance due to the heterogeneity in classification error distribution within the cluster. We tested our method through simulated experiments for county-level area estimation of soybean crops in Nebraska State, USA. The results show that the root mean square errors (RMSEs) of the subregion area estimates using BAESCM are reduced by 21–64% compared to estimates based on pixel counting from the classification map. Additionally, the true coverages of the confidence intervals estimated by our method approximately matched their nominal coverages. Compared with traditional design-based estimators, the proposed BAESCM achieves better estimation accuracy of subregion areas when the sample size is limited. Therefore, the proposed method is particularly recommended for studies regarding subregion land cover areas in the case of inadequate ground truth samples.
format Article
id doaj-art-1ccdbe2a131041a2a0c929114a85e12a
institution DOAJ
issn 2072-4292
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-1ccdbe2a131041a2a0c929114a85e12a2025-08-20T03:08:56ZengMDPI AGRemote Sensing2072-42922025-03-01177114510.3390/rs17071145A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area EstimationBo Zhang0Xuehong Chen1Xihong Cui2Miaogen Shen3State Key Laboratory of Remote Sensing and Digital Earth, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, ChinaInstitute of Land Surface System and Sustainable Development, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, ChinaAccurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., bias-adjusted estimator, model-assisted difference estimator, model-assisted ratio estimator derived from confusion matrix), which combine the information of ground truth samples and the classification map, have been applied to provide more accurate area estimates and the uncertainty inference. These estimators work well for estimating areas in a region with sufficient ground truth samples, whereas they encounter challenges when estimating areas in multiple subregions where the samples are limited within each subregion. To overcome this limitation, we propose a novel Bias-Adjusted Estimator based on the Synthetic Confusion Matrix (BAESCM) for estimating land cover areas in subregions by downscaling the global sample information to the subregion scale. First, several clusters were generated from remote sensing data through the K-means method (with the number of clusters being much smaller than the number of subregions). Then, the cluster confusion matrix is estimated based on the samples in each cluster. Assuming that the classification error distribution within each cluster remains consistent across different subregions, the confusion matrix of the subregion can be synthesized by a weighted sum of the cluster confusion matrices, with the weights of the cluster abundances in the subregion. Finally, the classification bias at the subregion scale can be estimated based on the synthetic confusion matrix, and the area counted from the classification map is corrected accordingly. Moreover, we introduced a semi-empirical method for inferring the confidence intervals of the estimated areas, considering both the sampling variance due to sampling randomness and the downscaling variance due to the heterogeneity in classification error distribution within the cluster. We tested our method through simulated experiments for county-level area estimation of soybean crops in Nebraska State, USA. The results show that the root mean square errors (RMSEs) of the subregion area estimates using BAESCM are reduced by 21–64% compared to estimates based on pixel counting from the classification map. Additionally, the true coverages of the confidence intervals estimated by our method approximately matched their nominal coverages. Compared with traditional design-based estimators, the proposed BAESCM achieves better estimation accuracy of subregion areas when the sample size is limited. Therefore, the proposed method is particularly recommended for studies regarding subregion land cover areas in the case of inadequate ground truth samples.https://www.mdpi.com/2072-4292/17/7/1145classification mapsubregion area estimationbias-adjusted estimatorsynthetic confusion matrix
spellingShingle Bo Zhang
Xuehong Chen
Xihong Cui
Miaogen Shen
A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
Remote Sensing
classification map
subregion area estimation
bias-adjusted estimator
synthetic confusion matrix
title A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
title_full A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
title_fullStr A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
title_full_unstemmed A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
title_short A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
title_sort novel bias adjusted estimator based on synthetic confusion matrix baescm for subregion area estimation
topic classification map
subregion area estimation
bias-adjusted estimator
synthetic confusion matrix
url https://www.mdpi.com/2072-4292/17/7/1145
work_keys_str_mv AT bozhang anovelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT xuehongchen anovelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT xihongcui anovelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT miaogenshen anovelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT bozhang novelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT xuehongchen novelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT xihongcui novelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation
AT miaogenshen novelbiasadjustedestimatorbasedonsyntheticconfusionmatrixbaescmforsubregionareaestimation