LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data
Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11105382/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850034108486385664 |
|---|---|
| author | Nuo Cheng Chuanyu Luo Han Li Sikun Ma Shengguang Lei Pu Li |
| author_facet | Nuo Cheng Chuanyu Luo Han Li Sikun Ma Shengguang Lei Pu Li |
| author_sort | Nuo Cheng |
| collection | DOAJ |
| description | Masked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent sparsity and wide spatial coverage of such data. To address this issue, we develop a masked autoencoding pre-training model, LSV-MAE, making it possible to train detection models on large-volume unlabeled point cloud data. Our approach pre-trains the backbone to reconstruct masked voxel features extracted by PointNN. To enhance the feature extraction capability of the encoder, the point cloud is voxelized with different voxel sizes at different pre-training stages. Meanwhile, to avoid the effect of masking key points, the masked voxel features are re-integrated into the decoder during pretraining. To verify the proposed approach, experiments are conducted on well-known datasets, showing that our method not only can avoid tedious labeling work but improve the detection accuracy by up to 18%, compared with that using the model without pre-training, across datasets of different sizes. |
| format | Article |
| id | doaj-art-3a10747ffbcc4b448a9b37623bde1839 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-3a10747ffbcc4b448a9b37623bde18392025-08-20T02:57:57ZengIEEEIEEE Access2169-35362025-01-011313570813572110.1109/ACCESS.2025.359461411105382LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud DataNuo Cheng0https://orcid.org/0000-0002-4748-4554Chuanyu Luo1https://orcid.org/0000-0001-8496-8550Han Li2Sikun Ma3Shengguang Lei4Pu Li5https://orcid.org/0000-0001-6481-9961Process Optimization Group, Technische Universität Ilmenau, Ilmenau, GermanyProcess Optimization Group, Technische Universität Ilmenau, Ilmenau, GermanyLiangDao GmbH, Berlin, GermanyLiangDao GmbH, Berlin, GermanyLiangDao GmbH, Berlin, GermanyProcess Optimization Group, Technische Universität Ilmenau, Ilmenau, GermanyMasked language modeling (MLM) and masked image modeling (MIM) pretraining paradigms have achieved remarkable success in both natural language processing (NLP) and computer vision (CV). However, extending MIM to large-scale outdoor point cloud data presents significant challenges due to the inherent sparsity and wide spatial coverage of such data. To address this issue, we develop a masked autoencoding pre-training model, LSV-MAE, making it possible to train detection models on large-volume unlabeled point cloud data. Our approach pre-trains the backbone to reconstruct masked voxel features extracted by PointNN. To enhance the feature extraction capability of the encoder, the point cloud is voxelized with different voxel sizes at different pre-training stages. Meanwhile, to avoid the effect of masking key points, the masked voxel features are re-integrated into the decoder during pretraining. To verify the proposed approach, experiments are conducted on well-known datasets, showing that our method not only can avoid tedious labeling work but improve the detection accuracy by up to 18%, compared with that using the model without pre-training, across datasets of different sizes.https://ieeexplore.ieee.org/document/11105382/Pre-trainingautonomous drivingpoint cloudtransformerKITTInuScenes |
| spellingShingle | Nuo Cheng Chuanyu Luo Han Li Sikun Ma Shengguang Lei Pu Li LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data IEEE Access Pre-training autonomous driving point cloud transformer KITTI nuScenes |
| title | LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data |
| title_full | LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data |
| title_fullStr | LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data |
| title_full_unstemmed | LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data |
| title_short | LSV-MAE: A Masked-Autoencoder Pre-Training Approach for Large-Scale 3D Point Cloud Data |
| title_sort | lsv mae a masked autoencoder pre training approach for large scale 3d point cloud data |
| topic | Pre-training autonomous driving point cloud transformer KITTI nuScenes |
| url | https://ieeexplore.ieee.org/document/11105382/ |
| work_keys_str_mv | AT nuocheng lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata AT chuanyuluo lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata AT hanli lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata AT sikunma lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata AT shengguanglei lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata AT puli lsvmaeamaskedautoencoderpretrainingapproachforlargescale3dpointclouddata |