Find n’ Propagate: Open-Vocabulary 3D Object Detection in Urban Environments (arxiv 2024/03, ECCV2024)

Summary

  • Open-Vocabulary 3D Object Detection
  • https://github.com/djamahl99/findnpropagate
  • contribute
      1. 2D VLM を用いたfrustum base手法
      1. Greedy Box Seeker
      • frustumからsegmentしてspaceをsearchする
      1. Greedy Box Oracle
      • multi-view alignment and density ranking で修正する機構
      1. Remote Propagator
      • 遠距離にある novel pseudo label -> sparseになる
      • memory に入れて活用する

Background

  • 2D Open-vocabulary learning
    • (1) distilling knowledge from large vision-language models (VLMs) such as CLIP [29] for feature map matching [9], region prompting [36, 42], bipartite matching [20]
    • (2) employing pseudo-labelled boxes [49,50] or auxiliary grounding data [15, 23, 24, 46] as weak supervision in self-training
  • 3D
    • (1) Top-down Projection
    • (2) Top-down Self-train
    • (3) Top down Clustering
    • (4) Bottom-up Weakly-supervised 3D detection

  • Bottom up について

The Bottom-up approach presents a cost-effective alternative akin to weakly supervised 3D object detection, lifting 2D annotations to construct 3D bounding boxes. Different from Top-down counterparts, this approach is training-free and does not rely on any base annotations, potentially making it more generalisable and capable of finding objects with diverse shapes and densities. In Baseline IV, we study FGR [35] as an exemplar of Bottom-up Weakly-supervised and evaluate its effectiveness in generating novel proposals. FGR starts with removing background points such as the ground plane, then incorporates the human prior into key-vertex localization to refine box regression. However, their study was limited to regressing car objects, as their vertex localization assumes rectangular objects which do not hold for other classes

Method

    1. 2D VLM を用いたfrustum base手法
    • region VLMs (GLIP) or off-the-shelf OV-2D detectors (OWL-ViT)
    • k; proposalする的なイメージ
    1. Greedy Box Seeker
    • frustumからsegmentしてspaceをsearchする
    1. Greedy Box Oracle
    • Density ranlking
      • distinguishing between the foreground and background is crucial
    • multi-view alignment
      • bboxを適当いい感じに広げる
    1. Remote Propagator
    • geometry 位置角度をsimulateで変更する

Experiment

  • Baseline
    • TOP-DOWN Clustering: DBScanなど + CLIP2Scene
    • Bottom up wealky supervised: FGR
  • 結果

  • 全部新しいclass

  • 3 class -> 3 + 7 class

  • Greedy box seeker の比較

  • 可視化

  • 比較
    • 回転と点群sparseのaugmentationが効いている
    • Greedy box oracle はちょっとだけ効いている

Discussion