【小白深度教程 1.3】使用 BP-Net 深度补全网络,进行 KITTI 稠密点云和 图像融合 (含 Python 代码)
在上一节中,我们展示了如何将从 激光雷达 获取的稀疏点云(深度)映射到图像:
【小白深度教程 1.2】KITTI 点云-图像数据介绍,以及如何将点云映射到图像(含 Python 代码)
但是,稀疏的点云并不能覆盖图像的所有像素(仅有 5%)。我们想获取每个像素点的深度怎么办?
因此这篇论文我们尝试使用 Depth Completion(深度补全)技术,来将稀释的点云深度,补全为稠密深度。效果如下:
1. 安装 BP-Net:
介绍
BP-Net 通过一个非线性模型从附近的深度测量中传播目标深度,该模型的系数由一个多层感知器生成,并基于辐射差异和空间距离进行调整。通过在多尺度框架中将双边传播与多模态融合和深度细化相结合,BP-Net 在室内和室外场景中均表现出色,并在提交时在 KITTI 深度补全基准上排名第一。
具体解析可以查看:
【CVPR 2024】BP-Net:深度补全网络新 SOTA!
安装
下载代码:
https://github.com/kakaxi314/BP-Net.git
- 1
创建 conda 环境:
conda env create -f environment.yml
- 1
这样我们就得到了名为 bp 的 python 环境。
编译
然后编译 C++ 和 CUDA 代码:
cd exts
python setup.py install
- 1
- 2
数据集
可以直接下载 KITTI depth completion 数据集,下载完成后数据格式应该是:
└── datas
└── kitti
├── data_depth_annotated
│ ├── train
│ └── val
├── data_depth_velodyne
│ ├── train
│ └── val
├── raw
│ ├── 2011_09_26
│ ├── 2011_09_28
│ ├── 2011_09_29
│ ├── 2011_09_30
│ └── 2011_10_03
├── test_depth_completion_anonymous
│ ├── image
│ ├── intrinsics
│ └── velodyne_raw
└── val_selection_cropped
├── groundtruth_depth
├── image
├── intrinsics
└── velodyne_raw
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
2. 解析相机内参
import numpy as np
def read_calib_file(filepath):
"""Read in a calibration file and parse into a dictionary."""
data = {}
with open(filepath, 'r') as f:
for line in f.readlines():
key, value = line.split(':', 1)
# The only non-float values in these files are dates, which
# we don't care about anyway
try:
data[key] = np.array([float(x) for x in value.split()])
except ValueError:
pass
return data
def pull_K_cam(calib_path):
filedata = read_calib_file(calib_path)
P_rect_20 = np.reshape(filedata['P_rect_02'], (3, 4))
P_rect_30 = np.reshape(filedata['P_rect_03'], (3, 4))
K_cam = P_rect_20[0:3, 0:3]
return K_cam
c2c_filepath = "datas/kitti/raw/2011_09_26/calib_cam_to_cam.txt"
K_cam = pull_K_cam(c2c_filepath)
K_cam = K_cam.astype(np.float32)
print(K_cam)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
3. 加载稀疏深度(激光雷达)
import matplotlib.pyplot as plt
from PIL import Image
def pull_DEPTH(path):
depth_png = np.array(Image.open(path), dtype=int)
assert (np.max(depth_png) > 255)
depth_image = (depth_png / 256.).astype(np.float32)
return depth_image
depth_path = "datas/kitti/data_depth_annotated/val/2011_09_26_drive_0005_sync/proj_depth/groundtruth/image_02/0000000005.png"
lidar_map = pull_DEPTH(depth_path)
plt.imshow(lidar_map)
plt.show()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
同时我们可以查看图像:
path = "datas/kitti/raw/2011_09_26/2011_09_26_drive_0005_sync/image_02/data/0000000005.png"
img = np.array(Image.open(path).convert('RGB'), dtype=np.uint8)
plt.imshow(img)
plt.show()
- 1
- 2
- 3
- 4
- 5
4. 数据预处理
由于 BP-Net 的限制,我们需要裁剪图像:
height, width = 256, 1216
tp = img.shape[0] - height
lp = (img.shape[1] - width) // 2
img = img[tp:tp + height, lp:lp + width]
lidar_map = lidar_map[tp:tp + height, lp:lp + width]
K_cam[0, 2] -= lp
K_cam[1, 2] -= tp
print(img.shape, lidar_map.shape, K_cam)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
5. 使用 BP-Net 补全深度
这里我们自己写一个推理代码,加载图像、激光雷达和相机内参,预测稠密的深度:
import hydra
from utils import *
import sys
import torch
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
def read_calib_file(filepath):
"""Read in a calibration file and parse into a dictionary."""
data = {}
with open(filepath, 'r') as f:
for line in f.readlines():
key, value = line.split(':', 1)
# The only non-float values in these files are dates, which
# we don't care about anyway
try:
data[key] = np.array([float(x) for x in value.split()])
except ValueError:
pass
return data
def pull_K_cam(calib_path):
filedata = read_calib_file(calib_path)
P_rect_20 = np.reshape(filedata['P_rect_02'], (3, 4))
P_rect_30 = np.reshape(filedata['P_rect_03'], (3, 4))
K_cam = P_rect_20[0:3, 0:3]
return K_cam
c2c_filepath = "datas/kitti/raw/2011_09_26/calib_cam_to_cam.txt"
K_cam = pull_K_cam(c2c_filepath)
K_cam = K_cam.astype(np.float32)
print(K_cam)
def pull_DEPTH(path):
depth_png = np.array(Image.open(path), dtype=int)
assert (np.max(depth_png) > 255)
depth_image = (depth_png / 256.).astype(np.float32)
return depth_image
depth_path = "datas/kitti/data_depth_annotated/val/2011_09_26_drive_0005_sync/proj_depth/groundtruth/image_02/0000000005.png"
lidar_map = pull_DEPTH(depth_path)
path = "//data1/liangyingping/CVPR25/BP-Net/datas/kitti/raw/2011_09_26/2011_09_26_drive_0005_sync/image_02/data/0000000005.png"
img = np.array(Image.open(path).convert('RGB'), dtype=np.uint8)
mean = (90.9950, 96.2278, 94.3213)
std = (79.2382, 80.5267, 82.1483)
class Norm(object):
def __init__(self, mean, std):
self.mean = np.array(mean)
self.std = np.array(std)
def __call__(self, rgb):
rgb = (rgb - self.mean) / self.std
return rgb
img_transform = Norm(mean, std)
img = img_transform(img)
print(img.shape, lidar_map.shape)
height, width = 256, 1216
img = img.transpose(2, 0, 1).astype(np.float32)
lidar_map = np.expand_dims(lidar_map, axis=2)
lidar_map = lidar_map.transpose(2, 0, 1).astype(np.float32)
tp = img.shape[1] - height
lp = (img.shape[2] - width) // 2
img = img[:, tp:tp + height, lp:lp + width]
lidar_map = lidar_map[:, tp:tp + height, lp:lp + width]
K_cam[0, 2] -= lp
K_cam[1, 2] -= tp
print(img.shape, lidar_map.shape, K_cam)
@hydra.main(config_path='configs', config_name='config', version_base='1.2')
def get_output(cfg):
with Trainer(cfg) as run:
global img
global lidar_map
global K_cam
img = torch.tensor(img).cuda()
lidar_map = torch.tensor(lidar_map).cuda()
K_cam = torch.tensor(K_cam).cuda()
net = run.net_ema.module
net.eval()
output = net(
img[None, :],
None,
lidar_map[None, :],
K_cam[None, :]
)[-1]
print()
np.save("output.npy", output.detach().cpu().squeeze().numpy())
get_output()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
运行后,补全的深度会保存到 “output.npy” 文件里。
6. 将稠密深度映射到 3D 点云
将深度图转换为 三维点云 的一般步骤如下:
-
准备相机内参矩阵 :已知
K c a m = [ 721.5377 0 596.5593 0 721.5377 53.854004 0 0 1 ] K_{cam} =
K c am = 721.5377 0 0 0 721.5377 0 596.5593 53.854004 1 -
提取图像像素坐标 :对每个像素 ((u, v)) 读取对应的深度值 (d)。
-
反投影到三维空间 :
利用相机模型将图像坐标 ((u, v)) 和深度值 (d) 映射到三维空间坐标 ((X, Y, Z)),公式如下:
Z = d Z = d Z = d
X = ( u − c x ) × Z f x X = (u - c_x) \times \frac{Z}{f_x} X = ( u − c x ) × f x Z
Y = ( v − c y ) × Z f y Y = (v - c_y) \times \frac{Z}{f_y} Y = ( v − c y ) × f y Z
其中, f x f_x f x 和 f y f_y f y 是焦距(在内参矩阵的对角线上), c x c_x c x 和 c y c_y c y 是主点坐标。
我们使用如下代码可视化和保存点云文件:
import numpy as np
import open3d as o3d # 用于点云的可视化
import cv2 # 用于加载图像
# 获取图像的宽高
h, w = dense_depth.shape
# 创建三维点的列表
points = []
colors = []
# 遍历每个像素
for v in range(h):
for u in range(w):
Z = dense_depth[v, u] / 1000.0 # 根据深度图单位调整比例(例如毫米到米)
if Z == 0: # 深度为 0 的点跳过
continue
X = (u - K_cam[0, 2]) * Z / K_cam[0, 0]
Y = (v - K_cam[1, 2]) * Z / K_cam[1, 1]
points.append([X, Y, Z])
colors.append(img[v, u] / 255.0) # 颜色值归一化到 [0, 1]
# 转换为点云格式
points = np.array(points)
colors = np.array(colors)
# 使用 open3d 创建点云
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors)
# 可视化点云
o3d.visualization.draw_geometries([pcd])
# 保存到本地
# 保存点云到本地文件,格式为 .ply
output_filename = "output_point_cloud.ply" # 保存文件的路径和名称
o3d.io.write_point_cloud(output_filename, pcd)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38