【小白深度教程 1.2】KITTI 点云-图像数据介绍，以及如何将点云映射到图像（含 Python 代码）

本文链接： https://blog.csdn.net/2401_87064292/article/details/141885024

【小白深度教程 1.2】KITTI 点云 -图像数据介绍，以及如何将点云映射到图像（含 Python 代码）

这个教程将展示，如何解读 KITTI 的数据，并用 Python 将点云映射到图像上，如图所示：
在这里插入图片描述

1. 引言

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) 数据集是自动驾驶和计算机视觉领域中非常著名的数据集之一，它为研究人员提供了广泛的感知数据，包括图像、点云、IMU数据等，主要用于开发和评估自动驾驶算法。

数据集包括多种传感器采集的数据，如：

摄像头（图像数据） ：前向的立体视觉摄像头，包含左、右彩色图像和灰度图像。
激光雷达（点云数据） ：Velodyne HDL-64E 3D 激光扫描仪，用于捕获高分辨率的 3D 环境点云数据。
GPS/IMU ：用于提供高精度的定位和姿态信息。

在这里插入图片描述

KITTI 数据集可以从 KITTI 官方网站免费下载。

2. 准备数据集

我们使用一个示例数据：
KITTI 2011_09_26_drive_0005 dataset

我们下载一段数据：
在这里插入图片描述
Ubuntu 用户也可以直接使用 wget 下载：

wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_0005/2011_09_26_drive_0005_sync.zip
1

在这里插入图片描述

3. 数据解析

我的目的是解析三维激光点云并投影至二维图像坐标，得到类似RGBD相机的效果。

我们首先解压数据集：

unzip 2011_09_26_drive_0005_sync.zip
1

得到如下数据：

在这里插入图片描述

需要用到的文件包括：

Velodyne 点云文件 ：velodyne_points/data/0000000000.bin
RGB 相机左目 cam02 图像 ：image_02/data/0000000000.png

我们还需要下载标 激光到相机矩阵等标定文件 ：

wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_calib.zip
unzip 2011_09_26_calib.zip
1
2

在这里插入图片描述

其中，calib_cam_to_cam.txt 存储了左右两个 RGB 相机的内参和外参：

calib_time: 09-Jan-2012 13:57:47
corner_dist: 9.950000e-02
S_00: 1.392000e+03 5.120000e+02
K_00: 9.842439e+02 0.000000e+00 6.900000e+02 0.000000e+00 9.808141e+02 2.331966e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_00: -3.728755e-01 2.037299e-01 2.219027e-03 1.383707e-03 -7.233722e-02
R_00: 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
T_00: 2.573699e-16 -1.059758e-16 1.614870e-16
S_rect_00: 1.242000e+03 3.750000e+02
R_rect_00: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
P_rect_00: 7.215377e+02 0.000000e+00 6.095593e+02 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_01: 1.392000e+03 5.120000e+02
K_01: 9.895267e+02 0.000000e+00 7.020000e+02 0.000000e+00 9.878386e+02 2.455590e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_01: -3.644661e-01 1.790019e-01 1.148107e-03 -6.298563e-04 -5.314062e-02
R_01: 9.993513e-01 1.860866e-02 -3.083487e-02 -1.887662e-02 9.997863e-01 -8.421873e-03 3.067156e-02 8.998467e-03 9.994890e-01
T_01: -5.370000e-01 4.822061e-03 -1.252488e-02
S_rect_01: 1.242000e+03 3.750000e+02
R_rect_01: 9.996878e-01 -8.976826e-03 2.331651e-02 8.876121e-03 9.999508e-01 4.418952e-03 -2.335503e-02 -4.210612e-03 9.997184e-01
P_rect_01: 7.215377e+02 0.000000e+00 6.095593e+02 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_02: 1.392000e+03 5.120000e+02
K_02: 9.597910e+02 0.000000e+00 6.960217e+02 0.000000e+00 9.569251e+02 2.241806e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_02: -3.691481e-01 1.968681e-01 1.353473e-03 5.677587e-04 -6.770705e-02
R_02: 9.999758e-01 -5.267463e-03 -4.552439e-03 5.251945e-03 9.999804e-01 -3.413835e-03 4.570332e-03 3.389843e-03 9.999838e-01
T_02: 5.956621e-02 2.900141e-04 2.577209e-03
S_rect_02: 1.242000e+03 3.750000e+02
R_rect_02: 9.998817e-01 1.511453e-02 -2.841595e-03 -1.511724e-02 9.998853e-01 -9.338510e-04 2.827154e-03 9.766976e-04 9.999955e-01
P_rect_02: 7.215377e+02 0.000000e+00 6.095593e+02 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
S_03: 1.392000e+03 5.120000e+02
K_03: 9.037596e+02 0.000000e+00 6.957519e+02 0.000000e+00 9.019653e+02 2.242509e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_03: -3.639558e-01 1.788651e-01 6.029694e-04 -3.922424e-04 -5.382460e-02
R_03: 9.995599e-01 1.699522e-02 -2.431313e-02 -1.704422e-02 9.998531e-01 -1.809756e-03 2.427880e-02 2.223358e-03 9.997028e-01
T_03: -4.731050e-01 5.551470e-03 -5.250882e-03
S_rect_03: 1.242000e+03 3.750000e+02
R_rect_03: 9.998321e-01 -7.193136e-03 1.685599e-02 7.232804e-03 9.999712e-01 -2.293585e-03 -1.683901e-02 2.415116e-03 9.998553e-01
P_rect_03: 7.215377e+02 0.000000e+00 6.095593e+02 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

其中：

S_xx ：1x2 矫正前的图像xx的大小
K_xx ：3x3 矫正前摄像机xx的校准矩阵
D_xx ：1x5 矫正前摄像头xx的失真向量
R_xx ：3x3 （外部）的旋转矩阵(从相机0到相机xx)
T_xx ：3x1 （外部）的平移矢量(从相机0到相机xx)
S_rect_xx ：1x2 矫正后的图像xx的大小
R_rect_xx ：3x3 纠正旋转矩阵(使图像平面共面)
P_rect_xx ：3x4 矫正后的投影矩阵

xx：00,01,02,03 代表相机的编号，0表示左边灰度相机，1右边灰度相机，2左边彩色相机，3右边彩色相机。

然后，calib_velo_to_cam.txt 存储了激光雷达到相机的转换：

calib_time: 15-Mar-2012 11:37:16
R: 7.533745e-03 -9.999714e-01 -6.166020e-04 1.480249e-02 7.280733e-04 -9.998902e-01 9.998621e-01 7.523790e-03 1.480755e-02
T: -4.069766e-03 -7.631618e-02 -2.717806e-01
delta_f: 0.000000e+00 0.000000e+00
delta_c: 0.000000e+00 0.000000e+00

1
2
3
4
5
6

考虑到我们只用将激光雷达映射到图像，所以只需要：1）左目相机内参，2）激光雷达到相机的转换。

4. 可视化数据

首先我们查看 RGB 图像：

import numpy as np
import cv2
import os
import matplotlib.pyplot as plt
%matplotlib inline

# 读取左目图像

base = "2011_09_26/2011_09_26_drive_0005_sync"

image_type = 'color' # 'grayscale' or 'color' image

mode = '00' if image_type == 'grayscale' else '02' 

image = cv2.imread(os.path.join(base, 'image_'+ mode +'/data/0000000000.png'))
width, height = image.shape[1], image.shape[0]

plt.imshow(image[:, :, ::-1])
print(image.shape)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

在这里插入图片描述
KITTI 图像采用了固定高 375、宽 1242 的大小。

然后我们查看点云：

def load_from_bin(bin_path):
    obj = np.fromfile(bin_path, dtype=np.float32).reshape(-1, 4)
    # ignore reflectivity info
    return obj[:,:3]

# bin file -> numpy array
velo_points = load_from_bin(os.path.join(base, 'velodyne_points/data/0000000000.bin'))

print(velo_points.shape)
1
2
3
4
5
6
7
8
9

在这里插入图片描述
点云被存储成了 (N, 3) 大小的矩阵。

然后我们可视化点云：

from mpl_toolkits.mplot3d import Axes3D

# 创建一个新的图形
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# 提取 x, y, z 坐标
x = velo_points[:, 0]
y = velo_points[:, 1]
z = velo_points[:, 2]

# 绘制点云
ax.scatter(x, y, z, c=z, cmap='viridis', s=1)  # c=z 设置颜色，s 设置点的大小

# 设置坐标轴标签
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z Axis')

# 显示图形
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

在这里插入图片描述
看起来非常奇怪，这是因为没有图像我们很难直观地从稀疏点云中理解 3D 结构。

因此，接下来我们尝试利用相机和激光雷达参数，来融合这两个传感器的数据。

5. 点云- 图像融合

首先定义一些可视化和解析函数：

def depth_color(val, min_d=0, max_d=120):
    """ 
    print Color(HSV's H value) corresponding to distance(m) 
    close distance = red , far distance = blue
    """
    np.clip(val, 0, max_d, out=val) # max distance is 120m but usually not usual
    return (((val - min_d) / (max_d - min_d)) * 120).astype(np.uint8) 

def in_h_range_points(points, m, n, fov):
    """ extract horizontal in-range points """
    return np.logical_and(np.arctan2(n,m) > (-fov[1] * np.pi / 180), \
                          np.arctan2(n,m) < (-fov[0] * np.pi / 180))

def in_v_range_points(points, m, n, fov):
    """ extract vertical in-range points """
    return np.logical_and(np.arctan2(n,m) < (fov[1] * np.pi / 180), \
                          np.arctan2(n,m) > (fov[0] * np.pi / 180))

def fov_setting(points, x, y, z, dist, h_fov, v_fov):
    """ filter points based on h,v FOV  """
    
    if h_fov[1] == 180 and h_fov[0] == -180 and v_fov[1] == 2.0 and v_fov[0] == -24.9:
        return points
    
    if h_fov[1] == 180 and h_fov[0] == -180:
        return points[in_v_range_points(points, dist, z, v_fov)]
    elif v_fov[1] == 2.0 and v_fov[0] == -24.9:        
        return points[in_h_range_points(points, x, y, h_fov)]
    else:
        h_points = in_h_range_points(points, x, y, h_fov)
        v_points = in_v_range_points(points, dist, z, v_fov)
        return points[np.logical_and(h_points, v_points)]

def in_range_points(points, size):
    """ extract in-range points """
    return np.logical_and(points > 0, points < size)    

def velo_points_filter(points, v_fov, h_fov):
    """ extract points corresponding to FOV setting """
    
    # Projecting to 2D
    x = points[:, 0]
    y = points[:, 1]
    z = points[:, 2]
    dist = np.sqrt(x ** 2 + y ** 2 + z ** 2)

    if h_fov[0] < -90:
        h_fov = (-90,) + h_fov[1:]
    if h_fov[1] > 90:
        h_fov = h_fov[:1] + (90,)
    
    x_lim = fov_setting(x, x, y, z, dist, h_fov, v_fov)[:,None]
    y_lim = fov_setting(y, x, y, z, dist, h_fov, v_fov)[:,None]
    z_lim = fov_setting(z, x, y, z, dist, h_fov, v_fov)[:,None]

    # Stack arrays in sequence horizontally
    xyz_ = np.hstack((x_lim, y_lim, z_lim))
    xyz_ = xyz_.T

    # stack (1,n) arrays filled with the number 1
    one_mat = np.full((1, xyz_.shape[1]), 1)
    xyz_ = np.concatenate((xyz_, one_mat),axis = 0)

    # need dist info for points color
    dist_lim = fov_setting(dist, x, y, z, dist, h_fov, v_fov)
    color = depth_color(dist_lim, 0, 70)
    
    return xyz_, color
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
'运行

这里是要用到的两个参数路径：

v2c_filepath = './calib_velo_to_cam.txt'
c2c_filepath = './calib_cam_to_cam.txt'
1
2
'运行

然后我们定义函数，来读取参数：

def calib_velo2cam(filepath):
    """ 
    get Rotation(R : 3x3), Translation(T : 3x1) matrix info 
    using R,T matrix, we can convert velodyne coordinates to camera coordinates
    """
    with open(filepath, "r") as f:
        file = f.readlines()    
        
        for line in file:
            (key, val) = line.split(':',1)
            if key == 'R':
                R = np.fromstring(val, sep=' ')
                R = R.reshape(3, 3)
            if key == 'T':
                T = np.fromstring(val, sep=' ')
                T = T.reshape(3, 1)
    return R, T

def calib_cam2cam(filepath, mode):
    """
    If your image is 'rectified image' :
        get only Projection(P : 3x4) matrix is enough
    but if your image is 'distorted image'(not rectified image) :
        you need undistortion step using distortion coefficients(5 : D)
        
    in this code, I'll get P matrix since I'm using rectified image
    """
    with open(filepath, "r") as f:
        file = f.readlines()
        
        for line in file:
            (key, val) = line.split(':',1)
            if key == ('P_rect_' + mode):
                P_ = np.fromstring(val, sep=' ')
                P_ = P_.reshape(3, 4)
                # erase 4th column ([0,0,0])
                P_ = P_[:3, :3]
    return P_
 
def velo3d_2_camera2d_points(points, v_fov, h_fov, vc_path, cc_path, mode='02'):
    """ print velodyne 3D points corresponding to camera 2D image """
    
    # R_vc = Rotation matrix ( velodyne -> camera )
    # T_vc = Translation matrix ( velodyne -> camera )
    R_vc, T_vc = calib_velo2cam(vc_path)
    
    # P_ = Projection matrix ( camera coordinates 3d points -> image plane 2d points )
    P_ = calib_cam2cam(cc_path, mode)

    """
    xyz_v - 3D velodyne points corresponding to h, v FOV in the velodyne coordinates
    c_    - color value(HSV's Hue) corresponding to distance(m)
    
             [x_1 , x_2 , .. ]
    xyz_v =  [y_1 , y_2 , .. ]   
             [z_1 , z_2 , .. ]
             [ 1  ,  1  , .. ]
    """  
    xyz_v, c_ = velo_points_filter(points, v_fov, h_fov)
    
    """
    RT_ - rotation matrix & translation matrix
        ( velodyne coordinates -> camera coordinates )
    
            [r_11 , r_12 , r_13 , t_x ]
    RT_  =  [r_21 , r_22 , r_23 , t_y ]   
            [r_31 , r_32 , r_33 , t_z ]
    """
    RT_ = np.concatenate((R_vc, T_vc),axis = 1)
    
    # convert velodyne coordinates(X_v, Y_v, Z_v) to camera coordinates(X_c, Y_c, Z_c) 
    for i in range(xyz_v.shape[1]):
        xyz_v[:3,i] = np.matmul(RT_, xyz_v[:,i])
        
    """
    xyz_c - 3D velodyne points corresponding to h, v FOV in the camera coordinates
             [x_1 , x_2 , .. ]
    xyz_c =  [y_1 , y_2 , .. ]   
             [z_1 , z_2 , .. ]
    """ 
    xyz_c = np.delete(xyz_v, 3, axis=0)

    # convert camera coordinates(X_c, Y_c, Z_c) image(pixel) coordinates(x,y) 
    for i in range(xyz_c.shape[1]):
        xyz_c[:,i] = np.matmul(P_, xyz_c[:,i])    

    """
    xy_i - 3D velodyne points corresponding to h, v FOV in the image(pixel) coordinates before scale adjustment
    ans  - 3D velodyne points corresponding to h, v FOV in the image(pixel) coordinates
             [s_1*x_1 , s_2*x_2 , .. ]
    xy_i =   [s_1*y_1 , s_2*y_2 , .. ]        ans =   [x_1 , x_2 , .. ]  
             [  s_1   ,   s_2   , .. ]                [y_1 , y_2 , .. ]
    """
    xy_i = xyz_c[::]/xyz_c[::][2]
    ans = np.delete(xy_i, 2, axis=0)
    
    """
    width = 1242
    height = 375
    w_range = in_range_points(ans[0], width)
    h_range = in_range_points(ans[1], height)

    ans_x = ans[0][np.logical_and(w_range,h_range)][:,None].T
    ans_y = ans[1][np.logical_and(w_range,h_range)][:,None].T
    c_ = c_[np.logical_and(w_range,h_range)]

    ans = np.vstack((ans_x, ans_y))
    """
    
    return ans, c_
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
'运行

最后，通过将点云映射到图像，得到融合结果：

def print_projection_cv2(points, color, image):
    """ project converted velodyne points into camera image """
    
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

    for i in range(points.shape[1]):
        cv2.circle(hsv_image, (np.int32(points[0][i]),np.int32(points[1][i])),2, (int(color[i]),255,255),-1)

    return cv2.cvtColor(hsv_image, cv2.COLOR_HSV2BGR)

def print_projection_plt(points, color, image):
    """ project converted velodyne points into camera image """
    
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

    for i in range(points.shape[1]):
        cv2.circle(hsv_image, (np.int32(points[0][i]),np.int32(points[1][i])),2, (int(color[i]),255,255),-1)

    return cv2.cvtColor(hsv_image, cv2.COLOR_HSV2RGB)

image_type = 'color' # 'grayscale' or 'color' image
mode = '00' if image_type == 'grayscale' else '02' # image_00 = 'grayscale image' , image_02 = 'color image'

image = cv2.imread('image_'+ mode +'/data/0000000089.png')

ans, c_ = velo3d_2_camera2d_points(velo_points, v_fov=(-24.9, 2.0), h_fov=(-45,45), \
                               vc_path=v2c_filepath, cc_path=c2c_filepath, mode=mode)

image = print_projection_plt(points=ans, color=c_, image=image)

# display result image
plt.subplots(1,1, figsize = (13,3) )
plt.title("Velodyne points to camera image Result")
plt.imshow(image)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

在这里插入图片描述
到这里我们已经完成了！

感兴趣的同学可以继续看原理解析。

6. 原理

设激光雷达点云中的一个点在激光雷达坐标系下的坐标为 $(X, Y, Z)$ ，映射到图像左边的像素坐标为 $(u, v)$ 。映射公式主要包括以下几个步骤：

坐标变换 ：将激光雷达坐标系的点 $(X, Y, Z)$ 转换到相机坐标系 $X_c, Y_c, Z_c)$ 。

其中， $lidar_to_camera \mathbf{T}_{\text{lidar\_to\_camera}}$ 是激光雷达坐标系到相机坐标系的变换矩阵，包括旋转和平移。
相机内参投影 ：将相机坐标系下的点 $X_c, Y_c, Z_c)$ 投影到图像平面得到像素坐标 $(u, v)$ 。

在这里插入图片描述

其中， $\mathbf{K}$ 是相机的内参矩阵，定义如下：

$\mathbf{K} =$

[\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}]

K = f_{x} 00 0 f_{y} 0 c_{x} c_{y} 1

$f_x$ , $f_y$ ：相机的焦距，单位为像素。
$c_x$ , $c_y$ ：主点坐标（光心），通常为图像的中心。

最终公式 ：

将激光雷达点 $(X, Y, Z)$ 映射到图像上的像素坐标 $(u, v)$ 的完整公式为：

在这里插入图片描述

文章知识点与官方知识档案匹配，可进一步学习相关知识

算法技能树首页概览 65013 人正在系统学习中