TY - GEN
T1 - MambaPan3D
T2 - 37th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2025
AU - Zhou, Ruishen
AU - Law, K. L.Eddie
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - With the advent of autonomous intelligent systems, such as humanoid robots, environmental perception should require rapid and accurate real-time 3D scene interpretation. LiDAR sensors are core and accurate distance measuring components, but it is complicated to process the unstructured, sparse, and unevenly distributed nature of LiDAR point cloud data while meeting the real-time object classification needs. To address the limitations of current 3D LiDAR-based panoptic segmentation methods, we propose a MambaPan3D design. It is a hybrid architecture that integrates Mamba and Transformer models for efficient and accurate 3D point cloud understanding. Our framework solves two key challenges: 1) geometric ambiguity caused by sparse and irregular LiDAR point cloud distributions, and 2) inefficient long-range dependency modeling in largescale scenes. Specifically, CartPolar-KAN embedding, a novel positional encoding strategy, is introduced to interpret between Cartesian and polar coordinates by adding a Kolmogorov-Arnold network (KAN) with learnable B-spline basis functions. The module dynamically fuses multi-coordinate features to overcome the limitations of fixed Bird's-Eye View (BEV) quantization. Additionally, our Mamba-Transformer Decoder combines the global attention capabilities of the Transformer and the linear computational efficiency of the Mamba state-space model to achieve real-time inference while maintaining the global receptive field. Extensive experiments on SemanticKITTI dataset demonstrated state-of-the-art performance. The panoptic quality (PQ) could reach 63.3 % in complex urban scenes, i.e., 1.3 % higher than the current optimal baseline method. The proposed framework provides a powerful solution for real-time situational awareness in autonomous driving systems, balancing accuracy, efficiency, and scalability. Our MambaPan3D model offers a robust solution for real-time situational awareness in autonomous systems through balancing accuracy, efficiency, and scalability.
AB - With the advent of autonomous intelligent systems, such as humanoid robots, environmental perception should require rapid and accurate real-time 3D scene interpretation. LiDAR sensors are core and accurate distance measuring components, but it is complicated to process the unstructured, sparse, and unevenly distributed nature of LiDAR point cloud data while meeting the real-time object classification needs. To address the limitations of current 3D LiDAR-based panoptic segmentation methods, we propose a MambaPan3D design. It is a hybrid architecture that integrates Mamba and Transformer models for efficient and accurate 3D point cloud understanding. Our framework solves two key challenges: 1) geometric ambiguity caused by sparse and irregular LiDAR point cloud distributions, and 2) inefficient long-range dependency modeling in largescale scenes. Specifically, CartPolar-KAN embedding, a novel positional encoding strategy, is introduced to interpret between Cartesian and polar coordinates by adding a Kolmogorov-Arnold network (KAN) with learnable B-spline basis functions. The module dynamically fuses multi-coordinate features to overcome the limitations of fixed Bird's-Eye View (BEV) quantization. Additionally, our Mamba-Transformer Decoder combines the global attention capabilities of the Transformer and the linear computational efficiency of the Mamba state-space model to achieve real-time inference while maintaining the global receptive field. Extensive experiments on SemanticKITTI dataset demonstrated state-of-the-art performance. The panoptic quality (PQ) could reach 63.3 % in complex urban scenes, i.e., 1.3 % higher than the current optimal baseline method. The proposed framework provides a powerful solution for real-time situational awareness in autonomous driving systems, balancing accuracy, efficiency, and scalability. Our MambaPan3D model offers a robust solution for real-time situational awareness in autonomous systems through balancing accuracy, efficiency, and scalability.
KW - 3D LiDAR point cloud sensors
KW - KolmogorovArnold Networks
KW - panoptic segmentation
KW - state-space model
KW - Transformer model
UR - https://www.scopus.com/pages/publications/105031899538
U2 - 10.1109/ICTAI66417.2025.00080
DO - 10.1109/ICTAI66417.2025.00080
M3 - Conference contribution
AN - SCOPUS:105031899538
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 544
EP - 551
BT - Proceedings - 2025 IEEE 37th International Conference on Tools with Artificial Intelligence, ICTAI 2025
PB - IEEE Computer Society
Y2 - 3 November 2025 through 5 November 2025
ER -