PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation

LiDAR-captured 3D point clouds are widely used in self-driving cars and smart cities. Point-based semantic segmentation methods allow for more efficient use of the rich geometric information contained in 3D point clouds, so it has gradually replaced other methods as the mainstream deep learning meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Hong Yi, Yaru Liu, Ming Wang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/12/2012
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849426005509275648
author Hong Yi
Yaru Liu
Ming Wang
author_facet Hong Yi
Yaru Liu
Ming Wang
author_sort Hong Yi
collection DOAJ
description LiDAR-captured 3D point clouds are widely used in self-driving cars and smart cities. Point-based semantic segmentation methods allow for more efficient use of the rich geometric information contained in 3D point clouds, so it has gradually replaced other methods as the mainstream deep learning method in 3D point cloud semantic segmentation. However, existing methods suffer from limited receptive fields and feature misalignment due to hierarchical downsampling. To address these challenges, we propose PSNet, a novel patch-based self-attention network that significantly expands the receptive field while ensuring feature alignment through a patch-aggregation paradigm. PSNet combines patch-based self-attention feature extraction with common point feature aggregation (CPFA) to implicitly model large-scale spatial relationships. The framework first divides the point cloud into overlapping patches to extract local features via multi-head self-attention, then aggregates features of common points across patches to capture long-range context. Extensive experiments on Toronto-3D and Complex Scene Point Cloud (CSPC) datasets validate PSNet’s state-of-the-art performance, achieving overall accuracies (OAs) of 98.4% and 97.2%, respectively, with significant improvements in challenging categories (e.g., +32.1% IoU for fences). Experimental results on the S3DIS dataset show that PSNet attains competitive mIoU accuracy (71.2%) while maintaining lower inference latency (7.03 s). The PSNet architecture achieves a larger receptive field coverage, which represents a significant advantage over existing methods. This work not only reveals the mechanism of patch-based self-attention for receptive field enhancement but also provides insights into attention-based 3D geometric learning and semantic segmentation architectures. Furthermore, it provides substantial references for applications in autonomous vehicle navigation and smart city infrastructure management.
format Article
id doaj-art-261260dc7a6940c8bc5e9bf06f35eb77
institution Kabale University
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-261260dc7a6940c8bc5e9bf06f35eb772025-08-20T03:29:35ZengMDPI AGRemote Sensing2072-42922025-06-011712201210.3390/rs17122012PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic SegmentationHong Yi0Yaru Liu1Ming Wang2College of Geographical Sciences, Harbin Normal University, Harbin 150025, ChinaGuangdong Urban-Rural Planning and Design Research Institute Technology Group Co., Ltd., Guangzhou 510290, ChinaInspur Cloud Information Technology Co., Ltd., Jinan 250101, ChinaLiDAR-captured 3D point clouds are widely used in self-driving cars and smart cities. Point-based semantic segmentation methods allow for more efficient use of the rich geometric information contained in 3D point clouds, so it has gradually replaced other methods as the mainstream deep learning method in 3D point cloud semantic segmentation. However, existing methods suffer from limited receptive fields and feature misalignment due to hierarchical downsampling. To address these challenges, we propose PSNet, a novel patch-based self-attention network that significantly expands the receptive field while ensuring feature alignment through a patch-aggregation paradigm. PSNet combines patch-based self-attention feature extraction with common point feature aggregation (CPFA) to implicitly model large-scale spatial relationships. The framework first divides the point cloud into overlapping patches to extract local features via multi-head self-attention, then aggregates features of common points across patches to capture long-range context. Extensive experiments on Toronto-3D and Complex Scene Point Cloud (CSPC) datasets validate PSNet’s state-of-the-art performance, achieving overall accuracies (OAs) of 98.4% and 97.2%, respectively, with significant improvements in challenging categories (e.g., +32.1% IoU for fences). Experimental results on the S3DIS dataset show that PSNet attains competitive mIoU accuracy (71.2%) while maintaining lower inference latency (7.03 s). The PSNet architecture achieves a larger receptive field coverage, which represents a significant advantage over existing methods. This work not only reveals the mechanism of patch-based self-attention for receptive field enhancement but also provides insights into attention-based 3D geometric learning and semantic segmentation architectures. Furthermore, it provides substantial references for applications in autonomous vehicle navigation and smart city infrastructure management.https://www.mdpi.com/2072-4292/17/12/2012patch-based self-attention3D point cloudsemantic segmentationreceptive field
spellingShingle Hong Yi
Yaru Liu
Ming Wang
PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
Remote Sensing
patch-based self-attention
3D point cloud
semantic segmentation
receptive field
title PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
title_full PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
title_fullStr PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
title_full_unstemmed PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
title_short PSNet: Patch-Based Self-Attention Network for 3D Point Cloud Semantic Segmentation
title_sort psnet patch based self attention network for 3d point cloud semantic segmentation
topic patch-based self-attention
3D point cloud
semantic segmentation
receptive field
url https://www.mdpi.com/2072-4292/17/12/2012
work_keys_str_mv AT hongyi psnetpatchbasedselfattentionnetworkfor3dpointcloudsemanticsegmentation
AT yaruliu psnetpatchbasedselfattentionnetworkfor3dpointcloudsemanticsegmentation
AT mingwang psnetpatchbasedselfattentionnetworkfor3dpointcloudsemanticsegmentation