Towards Stable 3D Object Detection (ECCV2024)

Summary

“current state-of-the-art detectors predominantly emphasize improving single-shot detection accuracy”
- “the temporal stability of 3D object detection greatly impacts the driving safety”
detectionが不安定 -> trackingで悪影響
“Current metrics in measuring detection accuracy, such as mAP [14], usually overlook temporal information, which is fundamental for stability assessment. On the other hand, metrics designed for temporal object tracking (e.g., MOTA and MOTP [2]) are tailored to evaluate how well objects are tracked over time.”

SI
- I l = localization
- I e = extent (= size)
- I h = heading
- I c = confidence

CenterPointのSIが低いのはかなり感覚に合う
- DSVTが総合的に優秀っぽい
- TransFusion-CLはconfidence stability が低い
  - カメラ使った手法だとconfidence stabilityが低いのはかなりしっくり来る
  - confidence以外の安定性は高い

PCLのそれぞれのcomponent
- confidence SI が一番寄与している
- 結局confidence = 安定して検出できる自身度合いにするのが一番筋が良い、というのはかなり感覚に合う

https://github.com/jbwang1997/StabilityIndex/blob/master/pcdet/datasets/waymo/waymo_stability_index.py
- codeの本体
- さほど長くない
Appendixの数学的な色々な解析が乗っている
Appendix A.1 IoUの解析
- XYにずれた状態だとyaw = 0が最大IoUにならない
- IoUで測るのは微妙

TransFormer based な手法だとPCLの影響が低い
- “The transformer-based model is capable of generating more stable estimations for heading and localization.”

auto labelingを使っての検証
- CTRL(“Once detected, never lost: Surpassing human performance in offline lidar based 3d object detection”)を適応
- Heading stabilityが上がらないので、heading stabilityが最も難しい課題と言える