【小白深度教程 1.3】使用 BP-Net 深度补全网络，进行 KITTI 稠密点云和图像融合（Depth Completion 实战）

本文链接： https://blog.csdn.net/2401_87064292/article/details/141894091

【小白深度教程 1.3】使用 BP-Net 深度补全网络，进行 KITTI 稠密点云和图像融合（含 Python 代码）

在上一节中，我们展示了如何将从激光雷达获取的稀疏点云（深度）映射到图像：

【小白深度教程 1.2】KITTI 点云-图像数据介绍，以及如何将点云映射到图像（含 Python 代码）

但是，稀疏的点云并不能覆盖图像的所有像素（仅有 5%）。我们想获取每个像素点的深度怎么办？

在这里插入图片描述

因此这篇论文我们尝试使用 Depth Completion（深度补全）技术，来将稀释的点云深度，补全为稠密深度。效果如下：

在这里插入图片描述

1. 安装 BP-Net：

介绍

BP-Net 通过一个非线性模型从附近的深度测量中传播目标深度，该模型的系数由一个多层感知器生成，并基于辐射差异和空间距离进行调整。通过在多尺度框架中将双边传播与多模态融合和深度细化相结合，BP-Net 在室内和室外场景中均表现出色，并在提交时在 KITTI 深度补全基准上排名第一。

具体解析可以查看：

【CVPR 2024】BP-Net：深度补全网络新 SOTA！

安装

下载代码：

https://github.com/kakaxi314/BP-Net.git
1

创建 conda 环境：

conda env create -f environment.yml
1

这样我们就得到了名为 bp 的 python 环境。

编译

然后编译 C++ 和 CUDA 代码:

cd exts
python setup.py install
1
2

数据集

可以直接下载 KITTI depth completion 数据集，下载完成后数据格式应该是：

└── datas
    └── kitti
        ├── data_depth_annotated
        │   ├── train
        │   └── val
        ├── data_depth_velodyne
        │   ├── train
        │   └── val
        ├── raw
        │   ├── 2011_09_26
        │   ├── 2011_09_28
        │   ├── 2011_09_29
        │   ├── 2011_09_30
        │   └── 2011_10_03
        ├── test_depth_completion_anonymous
        │   ├── image
        │   ├── intrinsics
        │   └── velodyne_raw
        └── val_selection_cropped
            ├── groundtruth_depth
            ├── image
            ├── intrinsics
            └── velodyne_raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

2. 解析相机内参

import numpy as np

def read_calib_file(filepath):
    """Read in a calibration file and parse into a dictionary."""
    data = {}

    with open(filepath, 'r') as f:
        for line in f.readlines():
            key, value = line.split(':', 1)
            # The only non-float values in these files are dates, which
            # we don't care about anyway
            try:
                data[key] = np.array([float(x) for x in value.split()])
            except ValueError:
                pass

    return data


def pull_K_cam(calib_path):

    filedata = read_calib_file(calib_path)
    P_rect_20 = np.reshape(filedata['P_rect_02'], (3, 4))
    P_rect_30 = np.reshape(filedata['P_rect_03'], (3, 4))
    K_cam = P_rect_20[0:3, 0:3]

    return K_cam

c2c_filepath = "datas/kitti/raw/2011_09_26/calib_cam_to_cam.txt"

K_cam = pull_K_cam(c2c_filepath)
K_cam = K_cam.astype(np.float32)
print(K_cam)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

在这里插入图片描述

3. 加载稀疏深度（激光雷达）

import matplotlib.pyplot as plt
from PIL import Image

def pull_DEPTH(path):
    depth_png = np.array(Image.open(path), dtype=int)
    assert (np.max(depth_png) > 255)
    depth_image = (depth_png / 256.).astype(np.float32)
    return depth_image


depth_path = "datas/kitti/data_depth_annotated/val/2011_09_26_drive_0005_sync/proj_depth/groundtruth/image_02/0000000005.png"

lidar_map = pull_DEPTH(depth_path)

plt.imshow(lidar_map)
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

在这里插入图片描述
同时我们可以查看图像：

path = "datas/kitti/raw/2011_09_26/2011_09_26_drive_0005_sync/image_02/data/0000000005.png"

img = np.array(Image.open(path).convert('RGB'), dtype=np.uint8)
plt.imshow(img)
plt.show()
1
2
3
4
5

在这里插入图片描述

4. 数据预处理

由于 BP-Net 的限制，我们需要裁剪图像：

height, width = 256, 1216

tp = img.shape[0] - height
lp = (img.shape[1] - width) // 2
        
img = img[tp:tp + height, lp:lp + width]
lidar_map = lidar_map[tp:tp + height, lp:lp + width]

K_cam[0, 2] -= lp
K_cam[1, 2] -= tp

print(img.shape, lidar_map.shape, K_cam)
1
2
3
4
5
6
7
8
9
10
11
12

在这里插入图片描述

5. 使用 BP-Net 补全深度

这里我们自己写一个推理代码，加载图像、激光雷达和相机内参，预测稠密的深度：

import hydra
from utils import *
import sys
import torch
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np


def read_calib_file(filepath):
    """Read in a calibration file and parse into a dictionary."""
    data = {}

    with open(filepath, 'r') as f:
        for line in f.readlines():
            key, value = line.split(':', 1)
            # The only non-float values in these files are dates, which
            # we don't care about anyway
            try:
                data[key] = np.array([float(x) for x in value.split()])
            except ValueError:
                pass

    return data


def pull_K_cam(calib_path):

    filedata = read_calib_file(calib_path)
    P_rect_20 = np.reshape(filedata['P_rect_02'], (3, 4))
    P_rect_30 = np.reshape(filedata['P_rect_03'], (3, 4))
    K_cam = P_rect_20[0:3, 0:3]

    return K_cam


c2c_filepath = "datas/kitti/raw/2011_09_26/calib_cam_to_cam.txt"

K_cam = pull_K_cam(c2c_filepath)
K_cam = K_cam.astype(np.float32)
print(K_cam)


def pull_DEPTH(path):
    depth_png = np.array(Image.open(path), dtype=int)
    assert (np.max(depth_png) > 255)
    depth_image = (depth_png / 256.).astype(np.float32)
    return depth_image


depth_path = "datas/kitti/data_depth_annotated/val/2011_09_26_drive_0005_sync/proj_depth/groundtruth/image_02/0000000005.png"

lidar_map = pull_DEPTH(depth_path)


path = "//data1/liangyingping/CVPR25/BP-Net/datas/kitti/raw/2011_09_26/2011_09_26_drive_0005_sync/image_02/data/0000000005.png"

img = np.array(Image.open(path).convert('RGB'), dtype=np.uint8)

mean = (90.9950, 96.2278, 94.3213)
std = (79.2382, 80.5267, 82.1483)


class Norm(object):
    def __init__(self, mean, std):
        self.mean = np.array(mean)
        self.std = np.array(std)

    def __call__(self, rgb):
        rgb = (rgb - self.mean) / self.std
        return rgb


img_transform = Norm(mean, std)
img = img_transform(img)

print(img.shape, lidar_map.shape)

height, width = 256, 1216

img = img.transpose(2, 0, 1).astype(np.float32)
lidar_map = np.expand_dims(lidar_map, axis=2)
lidar_map = lidar_map.transpose(2, 0, 1).astype(np.float32)

tp = img.shape[1] - height
lp = (img.shape[2] - width) // 2

img = img[:, tp:tp + height, lp:lp + width]
lidar_map = lidar_map[:, tp:tp + height, lp:lp + width]

K_cam[0, 2] -= lp
K_cam[1, 2] -= tp

print(img.shape, lidar_map.shape, K_cam)

@hydra.main(config_path='configs', config_name='config', version_base='1.2')
def get_output(cfg):
    with Trainer(cfg) as run:
        global img
        global lidar_map
        global K_cam
        img = torch.tensor(img).cuda()
        lidar_map = torch.tensor(lidar_map).cuda()
        K_cam = torch.tensor(K_cam).cuda()
        net = run.net_ema.module
        net.eval()
        output = net(
            img[None, :],
            None,
            lidar_map[None, :],
            K_cam[None, :]
        )[-1]
        print()
        np.save("output.npy", output.detach().cpu().squeeze().numpy())

get_output()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

运行后，补全的深度会保存到 “output.npy” 文件里。

6. 将稠密深度映射到 3D 点云

将深度图转换为三维点云的一般步骤如下：

准备相机内参矩阵 ：已知

$K_{cam} =$
$[\begin{matrix} 721.5377 & 0 & 596.5593 \\ 0 & 721.5377 & 53.854004 \\ 0 & 0 & 1 \end{matrix}]$ $K_{c am} = 721.5377 00 0 721.5377 0 596.5593 53.854004 1$
提取图像像素坐标 ：对每个像素 ((u, v)) 读取对应的深度值 (d)。
反投影到三维空间 ：

利用相机模型将图像坐标 ((u, v)) 和深度值 (d) 映射到三维空间坐标 ((X, Y, Z))，公式如下：

$Z = d$

$c_x) \times \frac{Z}{f_x}$

$c_y) \times \frac{Z}{f_y}$

其中， $f_x$ 和 $f_y$ 是焦距（在内参矩阵的对角线上）， $c_x$ 和 $c_y$ 是主点坐标。

我们使用如下代码可视化和保存点云文件：

import numpy as np
import open3d as o3d  # 用于点云的可视化
import cv2  # 用于加载图像

# 获取图像的宽高
h, w = dense_depth.shape

# 创建三维点的列表
points = []
colors = []

# 遍历每个像素
for v in range(h):
    for u in range(w):
        Z = dense_depth[v, u] / 1000.0  # 根据深度图单位调整比例（例如毫米到米）
        if Z == 0:  # 深度为 0 的点跳过
            continue
        X = (u - K_cam[0, 2]) * Z / K_cam[0, 0]
        Y = (v - K_cam[1, 2]) * Z / K_cam[1, 1]
        points.append([X, Y, Z])
        colors.append(img[v, u] / 255.0)  # 颜色值归一化到 [0, 1]

# 转换为点云格式
points = np.array(points)
colors = np.array(colors)

# 使用 open3d 创建点云
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors)

# 可视化点云
o3d.visualization.draw_geometries([pcd])

# 保存到本地
# 保存点云到本地文件，格式为 .ply
output_filename = "output_point_cloud.ply"  # 保存文件的路径和名称
o3d.io.write_point_cloud(output_filename, pcd)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

文章知识点与官方知识档案匹配，可进一步学习相关知识

OpenCV技能树 OpenCV中的深度学习图像分类 29602 人正在系统学习中

【小白深度教程 1.3】使用 BP-Net 深度补全网络，进行 KITTI 稠密点云和图像融合（Depth Completion 实战）

【小白深度教程 1.3】使用 BP-Net 深度补全网络，进行 KITTI 稠密点云和 图像融合 （含 Python 代码）

1. 安装 BP-Net：

介绍

安装

编译

数据集

2. 解析相机内参

3. 加载稀疏深度（激光雷达）

4. 数据预处理

5. 使用 BP-Net 补全深度

6. 将稠密深度映射到 3D 点云

【小白深度教程 1.3】使用 BP-Net 深度补全网络，进行 KITTI 稠密点云和图像融合（含 Python 代码）