LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data

Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent...

Full description

Saved in:
Bibliographic Details
Main Authors: Nuo Cheng, Chuanyu Luo, Han Li, Sikun Ma, Shengguang Lei, Pu Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11105382/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent sparsity and wide spatial coverage of such data. To address this issue, we develop a masked autoencoding pre-training model, LSV-MAE, making it possible to train detection models on large-volume unlabeled point cloud data. Our approach pre-trains the backbone to reconstruct masked voxel features extracted by PointNN. To enhance the feature extraction capability of the encoder, the point cloud is voxelized with different voxel sizes at different pre-training stages. Meanwhile, to avoid the effect of masking key points, the masked voxel features are re-integrated into the decoder during pretraining. To verify the proposed approach, experiments are conducted on well-known datasets, showing that our method not only can avoid tedious labeling work but improve the detection accuracy by up to 18%, compared with that using the model without pre-training, across datasets of different sizes.
ISSN:2169-3536