Table of contents

DRIVE VLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv2024/02, CoRL2024)

Summary

Method

  • DriveVLM-Dual architecture
    • Output
      • Meta-action
      • Decision
      • Waypoints
    • Traditional pipeline = E2E model のこと

  • Integrating 3D Perception.
    • 2Dに投影して、critical objectとして扱えるようにする
  • High-frequency Trajectory Refinement
    • real-time, high-frequency inference capabilities のためにtrajectory refinement を行う
  • Scene Understanding for Planning (SUP-AD) Dataset

Experiment

  • System 1, 2はかなり寄与している

Discussion