概要
- 流れ
- PointNet -> PointNet++
- TangentConv
- SPLATNet: high-dimensional sparse lattice: not scale
- SuperPoint Graph
- Online Lidar
- SqueezeSeg and SqueezeSegV2
- spherical projection, a conditional random field (CRF)
- limit: 90deg, CRFをlabelの探索をすべての点群に対して行うものに変更
手法
III.A . range image化
- 点群 ($p_i$ = (x,y,z)) (3D) -> 球面投影されたデータ (u v) ( = puesudo range image, 2D)
- 列: 同時刻 行; 異なる時間のずれは自動車レベルの速度なら無視できる仮定 -> ほんと?
- vehicle motion
- $$ \begin{pmatrix} u \\v \end{pmatrix} = \begin{pmatrix} \frac{1}{2} (1-\arctan (y, x) \pi^{-1} w \\ {1-(\arcsin (z r^{-1})+f_{u p}) f^{-1}) h} \end{pmatrix} $$
- Tensor (5 × h × w) r, x, y, z, and remission
- (u, v)
III.B . 2D CNN semantic segmentation
- 球面投影されたデータ -> Semantic segmantation (2D)
- SqueezeSeg base: encoder-decoder hour-glass-shaped architecture
- downsampling is 32
- Loss function: $$\mathcal{L}=-\sum_{c=1}^{C} w_{c} y_{c} \log (\hat{y}_{c}), \text { where } w_{c}=\frac{1}{\log (f_{c}+\epsilon) }$$
- fc: inverse of its frequency, よく出てくるラベルの影響力を下げる
- アーキテクチャの流れ Darknet53 (Yolov3) -> RangeNet53 -> Rangenet++
- data size
- velodyne 64列を使っているのでh = 64
- w = 2048 - 512で評価
III.C. 2Dto3D semantic transfer
- Semantic segmantation (2D) -> rawoutput (3D)
- (u, v)と各点群のペアの情報を用いる
- 最後の層見ると特徴量n層?
III.D. 3Dpost-processing
rawoutput (3D) -> filtered output (3D)
- rawoutput (3D) + rangeImage (2D) -> filtered output (3D)
- そのままのoutputだとshadow-like artifactsが出現する
- range image based 3Dpost-processing to clean the point cloud from undesired discretization and inference artifacts, using a fast, GPU-based kNN-search operating on all points.
- Efficient Projective Nearest Neighbor Search for Point Labels
- k-Nearest-Neighbor (kNN) base, 速い
- 閾値を追加する cut-off閾値 近い点群とする最大距離
- 並列化可能で速い
- Data
- Range Image $I_{range}$ of size W × H,
- Label Image $I_{label}$ of predictions of size W × H,
- Ranges R for each point p ∈ P of size N ,
- Image coordinates (u, v) of each point in R.
- Result: Labels L consensus for each point of size N .
- algorithm
- Get S^2 neighbors N'
- Get neighbors N
- Fill in real point ranges
- Label neighbors L 0 for each pixel
- Get label neighbors L for each point
- Distances to neighbors D for each point
- Compute inverse Gaussian Kernel
- Weight neighbors with inverse Gaussian kernel
- Find k-nearest neighbors S for each point
- Gather votes.
- Accumulate votes.
- Find maximum consensus.
- ハイパラ
- (i) S which the size of the search window
- (ii) k which is number of nearest neighbors
- (iii) cut-off which is the maximum allowed range difference for the k
- (iv) σ for the inverse gaussian.
- 4パラ総当りは結構しんどそう
- III.Cまでのハイパラにも依存していそう(特に解像度)
実験
- SemanticKITTI http://semantic-kitti.org/
- SuMa++ (Semantic SLAM)と合わせてslam+localization
- dynamic objectにたいして取り除くような処理
- 64*2048 12fps
- border-IoU: how far a point is to the self occlusion of the sensor
考察
- 全体として読みやすい
- range image base
- 実機応用する時、センサ依存が気になる
- そもそもSemantic segmantationをしているけどdetectionに比べるとやっていることがリッチ過ぎる?
- personのdetectionにはゆーてセマセグくらい必要?
- 実験として
- 1つのデータセットのみでの評価
次読むやつ
- J. Behley, C. Stachnis. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments, Proc. of Robotics: Science and Systems (RSS), 2018.