Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

21 May 2025

Main:6 Pages

9 Figures

Bibliography:2 Pages

3 Tables

Abstract

Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.

View on arXiv

@article{li2025_2505.16029,
  title={ Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection },
  author={ Shichao Li and Peiliang Li and Qing Lian and Peng Yun and Xiaozhi Chen },
  journal={arXiv preprint arXiv:2505.16029},
  year={ 2025 }
}

Comments on this paper