Towards Stable 3D Object Detection (ECCV2024)
Summary
Method
Background
- “current state-of-the-art detectors predominantly emphasize improving single-shot detection accuracy”
- “the temporal stability of 3D object detection greatly impacts the driving safety”
- detectionが不安定 -> trackingで悪影響
- “Current metrics in measuring detection accuracy, such as mAP [14], usually overlook temporal information, which is fundamental for stability assessment. On the other hand, metrics designed for
temporal object tracking (e.g., MOTA and MOTP [2]) are tailored to evaluate how well objects are tracked over time.”
Stability Index (SI)
data:image/s3,"s3://crabby-images/2e5bf/2e5bfeef8ba43d16e9d6647e329e33e6e9b9501c" alt=""
- SI
- I l = localization
- I e = extent (= size)
- I h = heading
- I c = confidence
Prediction Consistency Learning (PCL)
data:image/s3,"s3://crabby-images/8daff/8daff357e53870235faa21b646633ebe3d316a03" alt=""
Experiment
- CenterPointのSIが低いのはかなり感覚に合う
- DSVTが総合的に優秀っぽい
- TransFusion-CLはconfidence stability が低い
- カメラ使った手法だとconfidence stabilityが低いのはかなりしっくり来る
- confidence以外の安定性は高い
data:image/s3,"s3://crabby-images/75f44/75f44122600018bfeb40aebb0bd16476f9a06b0e" alt=""
data:image/s3,"s3://crabby-images/7f1c2/7f1c2da04cf6c093d8d7024f397dfc0ee19d8df8" alt=""
- multi-frame化
- mAPとSI両方ちょっとだけあがる
- pedestrianだけ2 frameが良さげに見える
data:image/s3,"s3://crabby-images/8fc54/8fc54a92e4ca3639263d3ddc4a8e790e82d31bbb" alt=""
- PCLの効果
- n = 0で十分に見える
- nを増やすとSIは増えているが、mAPHは減っている
data:image/s3,"s3://crabby-images/512f2/512f25caa1a75c392c1ce67003176d2ff859c060" alt=""
- PCLのそれぞれのcomponent
- confidence SI が一番寄与している
- 結局confidence = 安定して検出できる自身度合いにするのが一番筋が良い、というのはかなり感覚に合う
data:image/s3,"s3://crabby-images/91d07/91d0753626acb94f365f4c9e980439deb3030805" alt=""
- この可視化は非常に安定性を解析する上で良い可視化だと思う
- stabilityそのものも時系列でplotできるとより不安定になった瞬間を炙り出せそう
data:image/s3,"s3://crabby-images/cfdec/cfdec75b4258a451787eeb2334396de0e181be90" alt=""
Discussion
data:image/s3,"s3://crabby-images/27a5d/27a5de508db0e0e79680133bdc040f1da84b690a" alt=""
- SIの方が計算が早いため、有効といえる
- random translation and random point dropping
data:image/s3,"s3://crabby-images/3b3aa/3b3aa7d816d85d14070607be292cdddcf3ce6044" alt=""
data:image/s3,"s3://crabby-images/4ab48/4ab48c8e583938a5c13c380df6bb564a4532391a" alt=""
data:image/s3,"s3://crabby-images/c76f4/c76f45b2a1fa4d511b50d9221109ba1f59512341" alt=""
- TransFormer based な手法だとPCLの影響が低い
- “The transformer-based model is capable of generating more stable estimations for heading and localization.”
data:image/s3,"s3://crabby-images/74349/74349c0a6db2297c94a64c3070aba808f9b94ec2" alt=""
- auto labelingを使っての検証
- CTRL(“Once detected, never lost: Surpassing human performance in offline lidar based 3d object detection”)を適応
- Heading stabilityが上がらないので、heading stabilityが最も難しい課題と言える
data:image/s3,"s3://crabby-images/6f44d/6f44d6f6815055544e175a7adf7728d6dc1a8f36" alt=""