Advancing ALS Applications with Large-Scale Pre-Training: Framework, Dataset, and Downstream Assessment

The pre-training and fine-tuning paradigm has significantly advanced satellite remote sensing applications. However, its potential remains largely underexplored for airborne laser scanning (ALS), a key technology in domains such as forest management and urban planning. In this study, we address this...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoyi Xiu, Xin Liu, Taehoon Kim, Kyoung-Sook Kim
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/11/1859
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The pre-training and fine-tuning paradigm has significantly advanced satellite remote sensing applications. However, its potential remains largely underexplored for airborne laser scanning (ALS), a key technology in domains such as forest management and urban planning. In this study, we address this gap by constructing a large-scale ALS point cloud dataset and evaluating its effectiveness in downstream applications. We first propose a simple, generalizable framework for dataset construction, designed to maximize land cover and terrain diversity while allowing flexible control over dataset size. We instantiate this framework using ALS, land cover, and terrain data collected across the contiguous United States, resulting in a dataset geographically covering 17,000 + <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mi>km</mi><mn>2</mn></msup></semantics></math></inline-formula> (184 billion points) with diverse land cover and terrain types included. As a baseline self-supervised learning model, we adopt BEV-MAE, a state-of-the-art masked autoencoder for 3D outdoor point clouds, and pre-train it on the constructed dataset. The resulting models are fine-tuned for several downstream tasks, including tree species classification, terrain scene recognition, and point cloud semantic segmentation. Our results show that pre-trained models consistently outperform their counterparts trained from scratch across all downstream tasks, demonstrating the strong transferability of the learned representations. Additionally, we find that scaling the dataset using the proposed framework leads to consistent performance improvements, whereas datasets constructed via random sampling fail to achieve comparable gains.
ISSN:2072-4292