LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data
Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11105382/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent sparsity and wide spatial coverage of such data. To address this issue, we develop a masked autoencoding pre-training model, LSV-MAE, making it possible to train detection models on large-volume unlabeled point cloud data. Our approach pre-trains the backbone to reconstruct masked voxel features extracted by PointNN. To enhance the feature extraction capability of the encoder, the point cloud is voxelized with different voxel sizes at different pre-training stages. Meanwhile, to avoid the effect of masking key points, the masked voxel features are re-integrated into the decoder during pretraining. To verify the proposed approach, experiments are conducted on well-known datasets, showing that our method not only can avoid tedious labeling work but improve the detection accuracy by up to 18%, compared with that using the model without pre-training, across datasets of different sizes. |
|---|---|
| ISSN: | 2169-3536 |