Towards Stable 3D Object Detection (ECCV2024)
Summary
Method
Background
- “current state-of-the-art detectors predominantly emphasize improving single-shot detection accuracy”
- “the temporal stability of 3D object detection greatly impacts the driving safety”
- detectionが不安定 -> trackingで悪影響
- “Current metrics in measuring detection accuracy, such as mAP [14], usually overlook temporal information, which is fundamental for stability assessment. On the other hand, metrics designed for
temporal object tracking (e.g., MOTA and MOTP [2]) are tailored to evaluate how well objects are tracked over time.”
Stability Index (SI)
- SI
- I l = localization
- I e = extent (= size)
- I h = heading
- I c = confidence
Prediction Consistency Learning (PCL)
Experiment
- CenterPointのSIが低いのはかなり感覚に合う
- DSVTが総合的に優秀っぽい
- TransFusion-CLはconfidence stability が低い
- カメラ使った手法だとconfidence stabilityが低いのはかなりしっくり来る
- confidence以外の安定性は高い
- multi-frame化
- mAPとSI両方ちょっとだけあがる
- pedestrianだけ2 frameが良さげに見える
- PCLの効果
- n = 0で十分に見える
- nを増やすとSIは増えているが、mAPHは減っている
- PCLのそれぞれのcomponent
- confidence SI が一番寄与している
- 結局confidence = 安定して検出できる自身度合いにするのが一番筋が良い、というのはかなり感覚に合う
- この可視化は非常に安定性を解析する上で良い可視化だと思う
- stabilityそのものも時系列でplotできるとより不安定になった瞬間を炙り出せそう
Discussion
- SIの方が計算が早いため、有効といえる
- random translation and random point dropping
- TransFormer based な手法だとPCLの影響が低い
- “The transformer-based model is capable of generating more stable estimations for heading and localization.”
- auto labelingを使っての検証
- CTRL(“Once detected, never lost: Surpassing human performance in offline lidar based 3d object detection”)を適応
- Heading stabilityが上がらないので、heading stabilityが最も難しい課題と言える