【小白深度教程 1.2】KITTI 点云 -图像数据介绍,以及如何将点云映射到图像(含 Python 代码)
这个教程将展示,如何解读 KITTI 的数据,并用 Python 将点云映射到图像上,如图所示:
1. 引言
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) 数据集是自动驾驶和 计算机视觉 领域中非常著名的数据集之一,它为研究人员提供了广泛的感知数据,包括图像、点云、IMU数据等,主要用于开发和评估自动驾驶算法。
数据集包括多种传感器采集的数据,如:
- 摄像头(图像数据) :前向的立体视觉摄像头,包含左、右彩色图像和灰度图像。
- 激光雷达(点云数据) :Velodyne HDL-64E 3D 激光扫描仪,用于捕获高分辨率的 3D 环境点云数据。
- GPS/IMU :用于提供高精度的定位和姿态信息。
KITTI 数据集 可以从 KITTI 官方网站 免费下载。
2. 准备数据集
我们使用一个示例数据:
KITTI 2011_09_26_drive_0005 dataset
我们下载一段数据:
Ubuntu
用户也可以直接使用 wget 下载:
wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_0005/2011_09_26_drive_0005_sync.zip
- 1
3. 数据解析
我的目的是解析三维激光点云并投影至二维图像坐标,得到类似RGBD相机的效果。
我们首先解压数据集:
unzip 2011_09_26_drive_0005_sync.zip
- 1
得到如下数据:
需要用到的文件包括:
- Velodyne 点云文件 :velodyne_points/data/0000000000.bin
- RGB 相机左目 cam02 图像 :image_02/data/0000000000.png
我们还需要下载标 激光到相机矩阵等标定文件 :
wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_calib.zip
unzip 2011_09_26_calib.zip
- 1
- 2
其中,calib_cam_to_cam.txt 存储了左右两个 RGB 相机的内参和外参:
calib_time: 09-Jan-2012 13:57:47
corner_dist: 9.950000e-02
S_00: 1.392000e+03 5.120000e+02
K_00: 9.842439e+02 0.000000e+00 6.900000e+02 0.000000e+00 9.808141e+02 2.331966e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_00: -3.728755e-01 2.037299e-01 2.219027e-03 1.383707e-03 -7.233722e-02
R_00: 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
T_00: 2.573699e-16 -1.059758e-16 1.614870e-16
S_rect_00: 1.242000e+03 3.750000e+02
R_rect_00: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
P_rect_00: 7.215377e+02 0.000000e+00 6.095593e+02 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_01: 1.392000e+03 5.120000e+02
K_01: 9.895267e+02 0.000000e+00 7.020000e+02 0.000000e+00 9.878386e+02 2.455590e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_01: -3.644661e-01 1.790019e-01 1.148107e-03 -6.298563e-04 -5.314062e-02
R_01: 9.993513e-01 1.860866e-02 -3.083487e-02 -1.887662e-02 9.997863e-01 -8.421873e-03 3.067156e-02 8.998467e-03 9.994890e-01
T_01: -5.370000e-01 4.822061e-03 -1.252488e-02
S_rect_01: 1.242000e+03 3.750000e+02
R_rect_01: 9.996878e-01 -8.976826e-03 2.331651e-02 8.876121e-03 9.999508e-01 4.418952e-03 -2.335503e-02 -4.210612e-03 9.997184e-01
P_rect_01: 7.215377e+02 0.000000e+00 6.095593e+02 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_02: 1.392000e+03 5.120000e+02
K_02: 9.597910e+02 0.000000e+00 6.960217e+02 0.000000e+00 9.569251e+02 2.241806e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_02: -3.691481e-01 1.968681e-01 1.353473e-03 5.677587e-04 -6.770705e-02
R_02: 9.999758e-01 -5.267463e-03 -4.552439e-03 5.251945e-03 9.999804e-01 -3.413835e-03 4.570332e-03 3.389843e-03 9.999838e-01
T_02: 5.956621e-02 2.900141e-04 2.577209e-03
S_rect_02: 1.242000e+03 3.750000e+02
R_rect_02: 9.998817e-01 1.511453e-02 -2.841595e-03 -1.511724e-02 9.998853e-01 -9.338510e-04 2.827154e-03 9.766976e-04 9.999955e-01
P_rect_02: 7.215377e+02 0.000000e+00 6.095593e+02 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
S_03: 1.392000e+03 5.120000e+02
K_03: 9.037596e+02 0.000000e+00 6.957519e+02 0.000000e+00 9.019653e+02 2.242509e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_03: -3.639558e-01 1.788651e-01 6.029694e-04 -3.922424e-04 -5.382460e-02
R_03: 9.995599e-01 1.699522e-02 -2.431313e-02 -1.704422e-02 9.998531e-01 -1.809756e-03 2.427880e-02 2.223358e-03 9.997028e-01
T_03: -4.731050e-01 5.551470e-03 -5.250882e-03
S_rect_03: 1.242000e+03 3.750000e+02
R_rect_03: 9.998321e-01 -7.193136e-03 1.685599e-02 7.232804e-03 9.999712e-01 -2.293585e-03 -1.683901e-02 2.415116e-03 9.998553e-01
P_rect_03: 7.215377e+02 0.000000e+00 6.095593e+02 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
其中:
- S_xx :1x2 矫正前的图像xx的大小
- K_xx :3x3 矫正前摄像机xx的校准矩阵
- D_xx :1x5 矫正前摄像头xx的失真向量
- R_xx :3x3 (外部)的旋转矩阵(从相机0到相机xx)
- T_xx :3x1 (外部)的平移矢量(从相机0到相机xx)
- S_rect_xx :1x2 矫正后的图像xx的大小
- R_rect_xx :3x3 纠正旋转矩阵(使图像平面共面)
- P_rect_xx :3x4 矫正后的投影矩阵
xx:00,01,02,03 代表相机的编号,0表示左边灰度相机,1右边灰度相机,2左边彩色相机,3右边彩色相机。
然后,calib_velo_to_cam.txt 存储了激光雷达到相机的转换:
calib_time: 15-Mar-2012 11:37:16
R: 7.533745e-03 -9.999714e-01 -6.166020e-04 1.480249e-02 7.280733e-04 -9.998902e-01 9.998621e-01 7.523790e-03 1.480755e-02
T: -4.069766e-03 -7.631618e-02 -2.717806e-01
delta_f: 0.000000e+00 0.000000e+00
delta_c: 0.000000e+00 0.000000e+00
- 1
- 2
- 3
- 4
- 5
- 6
考虑到我们只用将激光雷达映射到图像,所以只需要:1)左目相机内参,2)激光雷达到相机的转换。
4. 可视化数据
首先我们查看 RGB 图像:
import numpy as np
import cv2
import os
import matplotlib.pyplot as plt
%matplotlib inline
# 读取左目图像
base = "2011_09_26/2011_09_26_drive_0005_sync"
image_type = 'color' # 'grayscale' or 'color' image
mode = '00' if image_type == 'grayscale' else '02'
image = cv2.imread(os.path.join(base, 'image_'+ mode +'/data/0000000000.png'))
width, height = image.shape[1], image.shape[0]
plt.imshow(image[:, :, ::-1])
print(image.shape)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
KITTI 图像采用了固定高 375、宽 1242 的大小。
然后我们查看点云:
def load_from_bin(bin_path):
obj = np.fromfile(bin_path, dtype=np.float32).reshape(-1, 4)
# ignore reflectivity info
return obj[:,:3]
# bin file -> numpy array
velo_points = load_from_bin(os.path.join(base, 'velodyne_points/data/0000000000.bin'))
print(velo_points.shape)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
点云被存储成了 (N, 3) 大小的矩阵。
然后我们可视化点云:
from mpl_toolkits.mplot3d import Axes3D
# 创建一个新的图形
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
# 提取 x, y, z 坐标
x = velo_points[:, 0]
y = velo_points[:, 1]
z = velo_points[:, 2]
# 绘制点云
ax.scatter(x, y, z, c=z, cmap='viridis', s=1) # c=z 设置颜色,s 设置点的大小
# 设置坐标轴标签
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z Axis')
# 显示图形
plt.show()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
看起来非常奇怪,这是因为没有图像我们很难直观地从稀疏点云中理解 3D 结构。
因此,接下来我们尝试利用相机和激光雷达参数,来融合这两个传感器的数据。
5. 点云- 图像融合
首先定义一些可视化和解析函数:
def depth_color(val, min_d=0, max_d=120):
"""
print Color(HSV's H value) corresponding to distance(m)
close distance = red , far distance = blue
"""
np.clip(val, 0, max_d, out=val) # max distance is 120m but usually not usual
return (((val - min_d) / (max_d - min_d)) * 120).astype(np.uint8)
def in_h_range_points(points, m, n, fov):
""" extract horizontal in-range points """
return np.logical_and(np.arctan2(n,m) > (-fov[1] * np.pi / 180), \
np.arctan2(n,m) < (-fov[0] * np.pi / 180))
def in_v_range_points(points, m, n, fov):
""" extract vertical in-range points """
return np.logical_and(np.arctan2(n,m) < (fov[1] * np.pi / 180), \
np.arctan2(n,m) > (fov[0] * np.pi / 180))
def fov_setting(points, x, y, z, dist, h_fov, v_fov):
""" filter points based on h,v FOV """
if h_fov[1] == 180 and h_fov[0] == -180 and v_fov[1] == 2.0 and v_fov[0] == -24.9:
return points
if h_fov[1] == 180 and h_fov[0] == -180:
return points[in_v_range_points(points, dist, z, v_fov)]
elif v_fov[1] == 2.0 and v_fov[0] == -24.9:
return points[in_h_range_points(points, x, y, h_fov)]
else:
h_points = in_h_range_points(points, x, y, h_fov)
v_points = in_v_range_points(points, dist, z, v_fov)
return points[np.logical_and(h_points, v_points)]
def in_range_points(points, size):
""" extract in-range points """
return np.logical_and(points > 0, points < size)
def velo_points_filter(points, v_fov, h_fov):
""" extract points corresponding to FOV setting """
# Projecting to 2D
x = points[:, 0]
y = points[:, 1]
z = points[:, 2]
dist = np.sqrt(x ** 2 + y ** 2 + z ** 2)
if h_fov[0] < -90:
h_fov = (-90,) + h_fov[1:]
if h_fov[1] > 90:
h_fov = h_fov[:1] + (90,)
x_lim = fov_setting(x, x, y, z, dist, h_fov, v_fov)[:,None]
y_lim = fov_setting(y, x, y, z, dist, h_fov, v_fov)[:,None]
z_lim = fov_setting(z, x, y, z, dist, h_fov, v_fov)[:,None]
# Stack arrays in sequence horizontally
xyz_ = np.hstack((x_lim, y_lim, z_lim))
xyz_ = xyz_.T
# stack (1,n) arrays filled with the number 1
one_mat = np.full((1, xyz_.shape[1]), 1)
xyz_ = np.concatenate((xyz_, one_mat),axis = 0)
# need dist info for points color
dist_lim = fov_setting(dist, x, y, z, dist, h_fov, v_fov)
color = depth_color(dist_lim, 0, 70)
return xyz_, color
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
这里是要用到的两个参数路径:
v2c_filepath = './calib_velo_to_cam.txt'
c2c_filepath = './calib_cam_to_cam.txt'
- 1
- 2
然后我们定义函数,来读取参数:
def calib_velo2cam(filepath):
"""
get Rotation(R : 3x3), Translation(T : 3x1) matrix info
using R,T matrix, we can convert velodyne coordinates to camera coordinates
"""
with open(filepath, "r") as f:
file = f.readlines()
for line in file:
(key, val) = line.split(':',1)
if key == 'R':
R = np.fromstring(val, sep=' ')
R = R.reshape(3, 3)
if key == 'T':
T = np.fromstring(val, sep=' ')
T = T.reshape(3, 1)
return R, T
def calib_cam2cam(filepath, mode):
"""
If your image is 'rectified image' :
get only Projection(P : 3x4) matrix is enough
but if your image is 'distorted image'(not rectified image) :
you need undistortion step using distortion coefficients(5 : D)
in this code, I'll get P matrix since I'm using rectified image
"""
with open(filepath, "r") as f:
file = f.readlines()
for line in file:
(key, val) = line.split(':',1)
if key == ('P_rect_' + mode):
P_ = np.fromstring(val, sep=' ')
P_ = P_.reshape(3, 4)
# erase 4th column ([0,0,0])
P_ = P_[:3, :3]
return P_
def velo3d_2_camera2d_points(points, v_fov, h_fov, vc_path, cc_path, mode='02'):
""" print velodyne 3D points corresponding to camera 2D image """
# R_vc = Rotation matrix ( velodyne -> camera )
# T_vc = Translation matrix ( velodyne -> camera )
R_vc, T_vc = calib_velo2cam(vc_path)
# P_ = Projection matrix ( camera coordinates 3d points -> image plane 2d points )
P_ = calib_cam2cam(cc_path, mode)
"""
xyz_v - 3D velodyne points corresponding to h, v FOV in the velodyne coordinates
c_ - color value(HSV's Hue) corresponding to distance(m)
[x_1 , x_2 , .. ]
xyz_v = [y_1 , y_2 , .. ]
[z_1 , z_2 , .. ]
[ 1 , 1 , .. ]
"""
xyz_v, c_ = velo_points_filter(points, v_fov, h_fov)
"""
RT_ - rotation matrix & translation matrix
( velodyne coordinates -> camera coordinates )
[r_11 , r_12 , r_13 , t_x ]
RT_ = [r_21 , r_22 , r_23 , t_y ]
[r_31 , r_32 , r_33 , t_z ]
"""
RT_ = np.concatenate((R_vc, T_vc),axis = 1)
# convert velodyne coordinates(X_v, Y_v, Z_v) to camera coordinates(X_c, Y_c, Z_c)
for i in range(xyz_v.shape[1]):
xyz_v[:3,i] = np.matmul(RT_, xyz_v[:,i])
"""
xyz_c - 3D velodyne points corresponding to h, v FOV in the camera coordinates
[x_1 , x_2 , .. ]
xyz_c = [y_1 , y_2 , .. ]
[z_1 , z_2 , .. ]
"""
xyz_c = np.delete(xyz_v, 3, axis=0)
# convert camera coordinates(X_c, Y_c, Z_c) image(pixel) coordinates(x,y)
for i in range(xyz_c.shape[1]):
xyz_c[:,i] = np.matmul(P_, xyz_c[:,i])
"""
xy_i - 3D velodyne points corresponding to h, v FOV in the image(pixel) coordinates before scale adjustment
ans - 3D velodyne points corresponding to h, v FOV in the image(pixel) coordinates
[s_1*x_1 , s_2*x_2 , .. ]
xy_i = [s_1*y_1 , s_2*y_2 , .. ] ans = [x_1 , x_2 , .. ]
[ s_1 , s_2 , .. ] [y_1 , y_2 , .. ]
"""
xy_i = xyz_c[::]/xyz_c[::][2]
ans = np.delete(xy_i, 2, axis=0)
"""
width = 1242
height = 375
w_range = in_range_points(ans[0], width)
h_range = in_range_points(ans[1], height)
ans_x = ans[0][np.logical_and(w_range,h_range)][:,None].T
ans_y = ans[1][np.logical_and(w_range,h_range)][:,None].T
c_ = c_[np.logical_and(w_range,h_range)]
ans = np.vstack((ans_x, ans_y))
"""
return ans, c_
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
最后,通过将点云映射到图像,得到融合结果:
def print_projection_cv2(points, color, image):
""" project converted velodyne points into camera image """
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
for i in range(points.shape[1]):
cv2.circle(hsv_image, (np.int32(points[0][i]),np.int32(points[1][i])),2, (int(color[i]),255,255),-1)
return cv2.cvtColor(hsv_image, cv2.COLOR_HSV2BGR)
def print_projection_plt(points, color, image):
""" project converted velodyne points into camera image """
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
for i in range(points.shape[1]):
cv2.circle(hsv_image, (np.int32(points[0][i]),np.int32(points[1][i])),2, (int(color[i]),255,255),-1)
return cv2.cvtColor(hsv_image, cv2.COLOR_HSV2RGB)
image_type = 'color' # 'grayscale' or 'color' image
mode = '00' if image_type == 'grayscale' else '02' # image_00 = 'grayscale image' , image_02 = 'color image'
image = cv2.imread('image_'+ mode +'/data/0000000089.png')
ans, c_ = velo3d_2_camera2d_points(velo_points, v_fov=(-24.9, 2.0), h_fov=(-45,45), \
vc_path=v2c_filepath, cc_path=c2c_filepath, mode=mode)
image = print_projection_plt(points=ans, color=c_, image=image)
# display result image
plt.subplots(1,1, figsize = (13,3) )
plt.title("Velodyne points to camera image Result")
plt.imshow(image)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
到这里我们已经完成了!
感兴趣的同学可以继续看原理解析。
6. 原理
设激光雷达点云中的一个点在激光雷达坐标系下的坐标为 ( X , Y , Z ) (X, Y, Z) ( X , Y , Z ) ,映射到图像左边的像素坐标为 ( u , v ) (u, v) ( u , v ) 。映射公式主要包括以下几个步骤:
-
坐标变换 :将激光雷达坐标系的点 ( X , Y , Z ) (X, Y, Z) ( X , Y , Z ) 转换到相机坐标系 ( X c , Y c , Z c ) (X_c, Y_c, Z_c) ( X c , Y c , Z c ) 。
其中, T lidar_to_camera \mathbf{T}_{\text{lidar\_to\_camera}} T lidar_to_camera 是激光雷达坐标系到相机坐标系的变换矩阵,包括旋转和平移。
-
相机内参投影 :将相机坐标系下的点 ( X c , Y c , Z c ) (X_c, Y_c, Z_c) ( X c , Y c , Z c ) 投影到图像平面得到像素坐标 ( u , v ) (u, v) ( u , v ) 。
其中, K \mathbf{K} K 是相机的内参矩阵,定义如下:
K = [ f x 0 c x 0 f y c y 0 0 1 ] \mathbf{K} =
- f x f_x f x , f y f_y f y :相机的焦距,单位为像素。
- c x c_x c x , c y c_y c y :主点坐标(光心),通常为图像的中心。
-
最终公式 :
将激光雷达点 ( X , Y , Z ) (X, Y, Z) ( X , Y , Z ) 映射到图像上的像素坐标 ( u , v ) (u, v) ( u , v ) 的完整公式为: