【小白深度教程 1.30】手把手教你使用 Open3D(13)使用双目视觉实现目标检测和测距(Yolov8,Python 代码)

在这里插入图片描述

1. 摘要

这篇文章中,我们尝试使用双目视觉实现 逐像素深度估算 ,并通过结合 YOLOv8 实现目标检测和距离测量。通过使用 块匹配 半全局块匹配(SGBM) 算法,我们展示了如何利用立体 计算机视觉 技术来准确确定深度信息。块匹配算法高效地在立体相机图像之间建立 对应关系 ,而SGBM算法优化了 视差图 估计过程。

算法流程包括:

  1. 我们接收所有左右的图像
  2. 计算视差
  3. 创建深度图
  4. 运行 YOLOv8 目标检测
  5. 显示距离及其边界框。
def pipeline(left_image, right_image, object_class):
    """
    Performs a pipeline of operations on stereo images to obtain a colored disparity map, RGB frame, and colored depth map.

    Input:
    - left_image: Left stereo image (RGB format)
    - right_image: Right stereo image (RGB format)
    - object_class: List of object classes of interest for bounding box retrieval

    Output:
    - disparity_map_colored: Colored disparity map (RGB format)
    - frame_rgb: RGB frame
    - depth_map_colored: Colored depth map (RGB format)
    """
    global focal_length

    # Calculate the disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                      matcher="stereo_sgbm", show_disparity=False)

    # Calculate the depth map
    depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

    # Get bounding box coordinates for specified object classes
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class, show_output=False)

    # Calculate colored disparity map, RGB frame, and colored depth map
    disparity_map_colored, frame_rgb, depth_map_colored = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map, show_output=False)

    return disparity_map_colored, frame_rgb, depth_map_colored
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

2. 数据集

本项目将使用 KITTI数据集 ,特别是立体相机图像。数据集包含全面的 3D目标检测基准 ,包括 7,481 张训练图像和 7,518 张测试图像。这些图像配有相应的点云,共提供 80,256 个标注对象用于分析。此外,项目推理阶段还将使用KITTI数据集的 同步和校正后的原始数据 ,进一步提高结果的准确性和可靠性。丰富的数据集使得对所提出方法的深入探索和评估成为可能,并确保项目结果的稳健性。

3. 简单立体视觉的应用

现在我们将尝试使用KITTI数据集找到物体的距离。为此,我们首先需要了解自动驾驶汽车上摄像头的配置。幸运的是,我们有一个非常详细的解释,描述了摄像头之间的位置和基线,如下所示。

下载用于 3D目标检测 的数据集时,我们有以下文件:

  • 左侧图像
  • 右侧图像
  • 校准数据
  • 标签

4.1 提取投影矩阵

下面是校准数据的示例。由于我们使用的是左右彩色图像,因此需要提取 P2 P3 以获取投影矩阵。

P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03
P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03
R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01
Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

注意,我们得到了左投影矩阵和右投影矩阵:

左侧投影矩阵
[[     721.54           0      609.56      44.857]
 [          0      721.54      172.85     0.21638]
 [          0           0           1   0.0027459]]

右侧投影矩阵
[[     721.54           0      609.56     -339.52]
 [          0      721.54      172.85      2.1999]
 [          0           0           1   0.0027299]]

RO到校正矩阵
[[    0.99992   0.0098378   -0.007445]
 [ -0.0098698     0.99994  -0.0042785]
 [  0.0074025   0.0043516     0.99996]]

激光雷达到相机矩阵
[[  0.0075337    -0.99997  -0.0006166  -0.0040698]
 [   0.014802  0.00072807    -0.99989   -0.076316]
 [    0.99986   0.0075238    0.014808    -0.27178]]

IMU到激光雷达矩阵
[[          1  0.00075531  -0.0020358    -0.80868]
 [ -0.0007854     0.99989   -0.014823     0.31956]
 [  0.0020244    0.014825     0.99989    -0.79972]]

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

接下来,我们需要从我们的投影矩阵中提取内参矩阵和外参矩阵。回顾以下公式:

在这里插入图片描述
我们将使用 OpenCV 的 decomposeProjectionMatrix 函数:

def decompose_projection_matrix(projection_matrix):
    """
    将投影矩阵分解为相机矩阵、旋转矩阵和平移向量。

    参数:
        projection_matrix (numpy.ndarray): 3x4投影矩阵。

    返回:
        tuple: 包含分解后的组件的元组:
            - camera_matrix (numpy.ndarray): 相机矩阵。 [ fx   0   cx ]
                                                           [  0  fy   cy ]
                                                           [  0   0    1 ]
            - rotation_matrix (numpy.ndarray): 旋转矩阵。
            - translation_vector (numpy.ndarray): 平移向量。
    """

    # 分解投影矩阵
    camera_matrix, rotation_matrix, translation_vector, _, _, _, _ = cv2.decomposeProjectionMatrix(projection_matrix)
    translation_vector = translation_vector/translation_vector[3]
    # 返回分解后的组件
    return camera_matrix, rotation_matrix, translation_vector

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

输出如下:

    # 提取焦距和基线
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])
  • 1
  • 2
  • 3
  • 4

可以观察到旋转矩阵是一个单位矩阵。我们需要从内参矩阵中提取焦距,并从平移向量计算基线,如下所示:

    # 提取焦距和基线
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])
  • 1
  • 2
  • 3
  • 4

输出:

焦距(x方向):721.5377

焦距(y方向):721.5377

基线: [    0.53271] 

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

注意基线是0.53271米,与自动驾驶汽车的配置相同。

4.2 提取标签

我们为每对图像获取一个标签文件。下面是我们的label.txt文件,每个对象有15个值。

Car 0.00 0 -1.56 564.62 174.59 616.43 224.74 1.61 1.66 3.20 -0.69 1.69 25.01 -1.59
Car 0.00 0 1.71 481.59 180.09 512.55 202.42 1.40 1.51 3.70 -7.43 1.88 47.55 1.55
Car 0.00 0 1.64 542.05 175.55 565.27 193.79 1.46 1.66 4.05 -4.71 1.71 60.52 1.56
Cyclist 0.00 0 1.89 330.60 176.09 355.61 213.60 1.72 0.50 1.95 -12.63 1.88 34.09 1.54
DontCare -1 -1 -10 753.33 164.32 798.00 186.74 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 738.50 171.32 753.27 184.42 -1 -1 -1 -1000 -1000 -1000 -10
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

注意,DontCare 标签表示未标注物体的区域,例如因为它们距离激光扫描仪太远。我们需要从标签中提取边界框坐标和每个物体的距离。我们将与地面真值进行比较。

def ground_truth_bbox(labels_file, object_class):
    lines = labels_file.split('\n')  # 将输入字符串拆分为行
    bounding_boxes = []

    for line in lines:
        line = line.strip()  # 删除首尾空格

        if line:
            data = line.split()

            # 提取每个对象的相关值
            obj_type = data[0]
            truncated = float(data[1])
            occluded = int(data[2])
            alpha = float(data[3])
            bbox = [float(val) for val in data[4:8]]
            dimensions = [float(val) for val in data[8:11]]
            location = [float(val) for val in data[11:14]]
            distance = float(data[13])
            rotation_y = float(data[14])

            if obj_type in object_class:  # 检查obj_type是否在目标类中
                # 将边界框尺寸和距离添加到列表中
                bounding_boxes.append((bbox, distance))

                # 打印3D边界框信息
                print("物体类型:", obj_type)
                print("截断:", truncated)
                print("遮挡:", occluded)
                print("Alpha:", alpha)
                print("边界框:", bbox)
                print("尺寸:", dimensions)
                print("位置:", location)
                print("真实距离:", distance)
                print("Y轴旋转:", rotation_y)
                print("------------------------")
    return bounding_boxes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

输出:

物体类型: Car
截断: 0.0
遮挡: 0
Alpha: 1.71
边界框: [481.59, 180.09, 512.55, 202.42]
尺寸: [1.4, 1.51, 3.7]
位置: [-7.43, 1.88, 47.55]
真实距离: 47.55 
Y轴旋转: 1.55
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

4.3 显示左右图像

我们首先显示KITTI数据集的左右图像。注意,由于基线为54 cm,相较于OAK-D立体相机,这个基线较大,因此会有更大的视差,从而更准确地计算深度图。

在这里插入图片描述

4.4 计算视差图

接下来,我们将使用OpenCV的块匹配函数(StereoBM_create)和半全局块匹配函数(StereoSGBM_create)来计算视差。我们需要对一些参数进行微调:num_disparities、block_size和window_size。

def compute_disparity(left_img, right_img, num_disparities=6 * 16, block_size=11, window_size=6, matcher="stereo_sgbm", show_disparity=True):
    """
    计算给定立体图像对的视差图。

    参数:
        image (numpy.ndarray): 左立体图像。
        img_pair (numpy.ndarray): 右立体图像。
        num_disparities (int): 最大视差减去最小视差。
        block_size (int): 块窗口的大小,必须为奇数。
        window_size (int): 视差平滑窗口的大小。
        matcher (str): 使用的匹配算法("stereo_bm" 或 "stereo_sgbm")。
        show_disparity (bool): 是否使用matplotlib显示视差图。

    返回:
        numpy.ndarray: 计算的视差图。
    """
    if matcher == "stereo_bm":
        # 创建立体BM匹配器
        matcher = cv2.StereoBM_create(numDisparities=num_disparities, blockSize=block_size)
    elif matcher == "stereo_sgbm":
        # 创建立体SGBM匹配器
        matcher = cv2.StereoSGBM_create(
            minDisparity=0, numDisparities=num_disparities, blockSize=block_size,
            P1=8 * 3 * window_size ** 2, P2=32 * 3 * window_size ** 2,
            mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
        )

    # 将图像转换为灰度
    left_gray = cv2.cvtColor(left_img, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_img, cv2.COLOR_BGR2GRAY)

    # 计算视差图
    disparity = matcher.compute(left_gray, right_gray).astype(np.float32) / 16


    if show_disparity:
        # 使用matplotlib显示视差图
        plt.imshow(disparity, cmap="CMRmap_r") #CMRmap_r # cividis
        plt.title('SGBM视差图', fontsize=12)
        plt.show()

    return disparity
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

观察使用块匹配的视差图比SGBM更嘈杂。从现在开始,我们将使用SGBM算法。

在这里插入图片描述
注意我们在图像的左侧有一条白色条带,因为在左图像与右图像之间没有可以匹配的特征。这个现象取决于block_size参数。

4.5 计算深度图

接下来,我们将使用视差图来输出深度图。我们将使用以下公式:

在这里插入图片描述

def calculate_depth_map(disparity, baseline, focal_length, show_depth_map=True):
    """
    从给定的视差图、基线和焦距计算深度图。

    参数:
        disparity (numpy.ndarray): 视差图。
        baseline (float): 摄像头之间的基线。
        focal_length (float): 相机的焦距。

    返回:
        numpy.ndarray: 深度图。
    """

    # 用一个较小的最小值替换视差中的所有0和-1(以避免除以0或负数)
    disparity[disparity == 0] = 0.1
    disparity[disparity == -1] = 0.1

    # 初始化深度图以匹配视差图的大小
    depth_map = np.ones(disparity.shape, np.single)

    # 计算深度
    depth_map[:] = focal_length * baseline / disparity[:]

    if show_depth_map:
        # 使用matplotlib显示深度图
        plt.imshow(depth_map, cmap="cividis") #CMRmap_r # cividis
        plt.show()

    return depth_map
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

输出是一个显示每个像素深度的深度图。

深度图代表逐像素深度估算。然而,我们感兴趣的是获取某些特定障碍物(如汽车或行人)的深度或距离。因此,我们将使用物体检测算法来分割深度图,并仅获取这些物体的距离。

4.6 目标检测

下一步是使用 YOLOv8 检测帧中的物体。我们需要获取边界框的坐标。

def get_bounding_box_center_frame(frame, model, names, object_class, show_output=True):

    bbox_coordinates = []
    frame_copy = frame.copy()

    # 使用指定模型对输入帧执行物体检测
    results = model(frame)

    # 遍历物体检测的结果
    for result in results:

        # 遍历结果中检测到的每个边界框
        for r in result.boxes.data.tolist():
            # 从边界框中提取坐标、分数和类别ID
            x1, y1, x2, y2, score, class_id = r
            x1 = int(x1)
            x2 = int(x2)
            y1 = int(y1)
            y2 = int(y2)

            # 根据类别ID获取类别名称
            class_name = names.get(class_id)


            # 检查类别名称是否与指定的object_class匹配,并且检测分数高于阈值
            if class_name in object_class  and score > 0.5:
                bbox_coordinates.append([x1, y1, x2, y2])

                # 在帧上绘制边界框
                cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)


    if show_output:
        # 将帧从BGR转换为RGB
        frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # 显示带有边界框的输出帧
        cv2.imshow("Output", frame_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


    # 返回中心坐标列表
    return bbox_coordinates
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43

我们决定只检测汽车、人和自行车。

在这里插入图片描述

4.7 结合物体检测计算距离

由于我们现在有了每帧的边界框坐标和深度图,我们可以使用这两部分数据提取每个障碍物的深度。

我们通过使用边界框中心的深度图索引来获取障碍物的距离。

def calculate_distance(bbox_coordinates, frame, depth_map, disparity_map, show_output=True):
    frame_copy = frame.copy()

    # Normalize the disparity map to [0, 255]
    disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the disparity map
    colormap = cv2.COLORMAP_JET
    disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

    # Normalize the depth map to [0, 255]
    depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the depth map
    colormap = cv2.COLORMAP_BONE
    depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

    for bbox_coor in bbox_coordinates:
        x1, y1, x2, y2 = bbox_coor
        center_x = (x1 + x2) // 2
        center_y = (y1 + y2) // 2

        distance = depth_map[center_y][center_x]
        print("Calculated distance:", distance)

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(disparity_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(depth_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(disparity_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(disparity_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(depth_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(depth_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)



    if show_output:
        # Convert frame from BGR to RGB
        #frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)

        # Show the output frame with bounding boxes
        cv2.imshow("Output disparity map", disparity_map_colored)
        cv2.imshow("Output frame", frame_copy)
        cv2.imshow("Output depth map", depth_map_colored)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    return disparity_map_colored, frame_copy, depth_map_colored
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85

我们绘制边界框,物体到摄像机的距离以米为单位显示在顶部。

在这里插入图片描述

完整代码:

import numpy as np
import cv2
import matplotlib.pyplot as plt
from ultralytics import YOLO
import os


def display_image_pair(index, left_image_folder, right_image_folder, show_picture=True):
    """
    Load and display a pair of left and right images from the specified folders.

    Args:
        index (int): Index of the image pair to display.
        left_image_folder (str): Path to the folder containing the left images.
        right_image_folder (str): Path to the folder containing the right images.
        show_picture (bool): Whether to display the image pair using matplotlib.
    """
    # Load the left image
    left_image_path = os.path.join(left_image_folder, os.listdir(left_image_folder)[index])
    left_image = cv2.cvtColor(cv2.imread(left_image_path), cv2.COLOR_BGR2RGB)

    # Load the right image
    right_image_path = os.path.join(right_image_folder, os.listdir(right_image_folder)[index])
    right_image = cv2.cvtColor(cv2.imread(right_image_path), cv2.COLOR_BGR2RGB)

    if show_picture:
        # Create subplots and display the image pair
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
        ax1.imshow(left_image)
        ax1.set_title('Image Left', fontsize=15)
        ax2.imshow(right_image)
        ax2.set_title('Image Right', fontsize=15)

        # Show the plot
        plt.show()

    # Return the left and right images
    return left_image, right_image



#--- Function to compute and display disparity
def compute_disparity(left_img, right_img, num_disparities=6 * 16, block_size=11, window_size=6, matcher="stereo_sgbm", show_disparity=True):
    """
    Compute the disparity map for a given stereo image pair.

    Args:
        image (numpy.ndarray): Left image of the stereo pair.
        img_pair (numpy.ndarray): Right image of the stereo pair.
        num_disparities (int): Maximum disparity minus minimum disparity.
        block_size (int): Size of the block window. It must be an odd number.
        window_size (int): Size of the disparity smoothness window.
        matcher (str): Matcher algorithm to use ("stereo_bm" or "stereo_sgbm").
        show_disparity (bool): Whether to display the disparity map using matplotlib.

    Returns:
        numpy.ndarray: The computed disparity map.
    """
    if matcher == "stereo_bm":
        # Create a Stereo BM matcher
        matcher = cv2.StereoBM_create(numDisparities=num_disparities, blockSize=block_size)
    elif matcher == "stereo_sgbm":
        # Create a Stereo SGBM matcher
        matcher = cv2.StereoSGBM_create(
            minDisparity=0, numDisparities=num_disparities, blockSize=block_size,
            P1=8 * 3 * window_size ** 2, P2=32 * 3 * window_size ** 2,
            mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
        )

    # Convert the images to grayscale
    left_gray = cv2.cvtColor(left_img, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_img, cv2.COLOR_BGR2GRAY)

    # Compute the disparity map
    disparity = matcher.compute(left_gray, right_gray).astype(np.float32) / 16


    if show_disparity:
        # Display the disparity map using matplotlib
        plt.imshow(disparity, cmap="CMRmap_r") #CMRmap_r # cividis
        plt.title('Disparity map with SGBM', fontsize=12)
        plt.show()

    return disparity

def display_text_file(index, folder_path):
    """
    Display the contents of a text file based on the specified index.

    Args:
        index (int): Index of the text file to display.
        folder_path (str): Path to the folder containing the text files.

    Returns:
        str: Contents of the text file.
    """
    # Get the list of text files in the folder
    txt_files = [f for f in os.listdir(folder_path) if f.endswith('.txt')]

    if index >= 0 and index < len(txt_files):
        # Get the file path of the text file based on the index
        file_path = os.path.join(folder_path, txt_files[index])

        # Open the file and read its contents
        with open(file_path, 'r') as file:
            contents = file.read()

        # Display the contents of the text file
        print(f"Text file at index {index}:\n{contents}")

        # Return the contents of the text file
        return contents
    else:
        print("Invalid index.")
        return None


#--- Function to retrieve callibration parameters
def get_calibration_parameters(file_contents):
    """
    Retrieve calibration parameters from the contents of a calibration file.

    Args:
        file_contents (str): Contents of the calibration file.

    Returns:
        tuple: Tuple containing the calibration parameters:
            - p_left (numpy.ndarray): Projection matrix for the left camera.
            - p_right (numpy.ndarray): Projection matrix for the right camera.
            - p_ro_rect (numpy.ndarray): Rectification matrix.
            - p_velo_to_cam (numpy.ndarray): Transformation matrix from velodyne to camera coordinates.
            - p_imu_to_velo (numpy.ndarray): Transformation matrix from IMU to velodyne coordinates.
    """
    fin = file_contents.split('\n')
    for line in fin:
        if line[:2] == 'P2':
            p_left = np.array(line[4:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:2] == 'P3':
            p_right = np.array(line[4:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:7] == 'R0_rect':
            p_ro_rect = np.array(line[9:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:14] == 'Tr_velo_to_cam':
            p_velo_to_cam = np.array(line[16:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:14] == 'Tr_imu_to_velo':
            p_imu_to_velo = np.array(line[16:].strip().split(" ")).astype('float32').reshape(3, -1)
    return p_left, p_right, p_ro_rect, p_velo_to_cam, p_imu_to_velo

def decompose_projection_matrix(projection_matrix):
    """
    Decompose a projection matrix into camera matrix, rotation matrix, and translation vector.

    Args:
        projection_matrix (numpy.ndarray): 3x4 projection matrix.

    Returns:
        tuple: Tuple containing the decomposed components:
            - camera_matrix (numpy.ndarray): Camera matrix. [ fx   0   cx ]
                                                            [  0  fy   cy ]
                                                            [  0   0    1 ]
            - rotation_matrix (numpy.ndarray): Rotation matrix.
            - translation_vector (numpy.ndarray): Translation vector.
    """

    # Decompose the projection matrix
    camera_matrix, rotation_matrix, translation_vector, _, _, _, _ = cv2.decomposeProjectionMatrix(projection_matrix)
    translation_vector = translation_vector/translation_vector[3]
    # Return the decomposed components
    return camera_matrix, rotation_matrix, translation_vector


def calculate_depth_map(disparity, baseline, focal_length, show_depth_map=True):
    """
    Calculates the depth map from a given disparity map, baseline, and focal length.

    Args:
        disparity (numpy.ndarray): Disparity map.
        baseline (float): Baseline between the cameras.
        focal_length (float): Focal length of the camera.

    Returns:
        numpy.ndarray: Depth map.
    """

    # Replace all instances of 0 and -1 disparity with a small minimum value (to avoid div by 0 or negatives)
    disparity[disparity == 0] = 0.1
    disparity[disparity == -1] = 0.1

    # Initialize the depth map to match the size of the disparity map
    depth_map = np.ones(disparity.shape, np.single)

    # Calculate the depths
    depth_map[:] = focal_length * baseline / disparity[:]

    if show_depth_map:
        # Display the disparity map using matplotlib
        plt.imshow(depth_map, cmap="cividis") #CMRmap_r # cividis
        plt.title('Depth map', fontsize=12)
        plt.show()

    return depth_map




def save_disparity_maps(left_image_folder, right_image_folder, output_folder):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        # Calculate the disparity map
        disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                          matcher="stereo_sgbm", show_disparity=False)

        # Normalize the disparity map to [0, 255]
        disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

        # Apply a colorful colormap to the disparity map
        colormap = cv2.COLORMAP_JET
        disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

        # Construct the output file path
        output_file = os.path.join(output_folder, image_file)

        # Save the disparity map as an image
        cv2.imwrite(output_file, disparity_map_colored)

        print(f"Disparity map saved: {output_file}")



def save_depth_maps(left_image_folder, right_image_folder, output_folder, baseline, focal_length):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        # Calculate the disparity map
        disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                          matcher="stereo_sgbm", show_disparity=False)

        # Calculate the depth map
        depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

        # # Normalize the depth map to [0, 255]
        # depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        #
        # # Apply a colorful colormap to the depth map
        # colormap = cv2.COLORMAP_BONE
        # depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

        # Construct the output file path
        output_file = os.path.join(output_folder, image_file)

        # Save the depth map as an image
        cv2.imwrite(output_file, depth_map)

        print(f"Depth map saved: {output_file}")




def get_bounding_box_center_frame(frame, model, names, object_class, show_output=True):

    bbox_coordinates = []
    frame_copy = frame.copy()

    # Perform object detection on the input frame using the specified model
    results = model(frame)

    # Iterate over the results of object detection
    for result in results:

        # Iterate over each bounding box detected in the result
        for r in result.boxes.data.tolist():
            # Extract the coordinates, score, and class ID from the bounding box
            x1, y1, x2, y2, score, class_id = r
            x1 = int(x1)
            x2 = int(x2)
            y1 = int(y1)
            y2 = int(y2)

            # Get the class name based on the class ID
            class_name = names.get(class_id)


            # Check if the class name matches the specified object_class and the detection score is above a threshold
            if class_name in object_class  and score > 0.5:
                bbox_coordinates.append([x1, y1, x2, y2])

                # Draw bounding box on the frame
                cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)


    if show_output:
        # Convert frame from BGR to RGB
        frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # Show the output frame with bounding boxes
        cv2.imshow("Output", frame_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


    # Return the list of center coordinates
    return bbox_coordinates


def calculate_distance(bbox_coordinates, frame, depth_map, disparity_map, show_output=True):
    frame_copy = frame.copy()

    # Normalize the disparity map to [0, 255]
    disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the disparity map
    colormap = cv2.COLORMAP_JET
    disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

    # Normalize the depth map to [0, 255]
    depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the depth map
    colormap = cv2.COLORMAP_BONE
    depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

    for bbox_coor in bbox_coordinates:
        x1, y1, x2, y2 = bbox_coor
        center_x = (x1 + x2) // 2
        center_y = (y1 + y2) // 2

        distance = depth_map[center_y][center_x]
        print("Calculated distance:", distance)

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(disparity_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(depth_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(disparity_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(disparity_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(depth_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(depth_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)



    if show_output:
        # Convert frame from BGR to RGB
        #frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)

        # Show the output frame with bounding boxes
        cv2.imshow("Output disparity map", disparity_map_colored)
        cv2.imshow("Output frame", frame_copy)
        cv2.imshow("Output depth map", depth_map_colored)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    return disparity_map_colored, frame_copy, depth_map_colored




def ground_truth_bbox(labels_file, object_class):
    lines = labels_file.split('\n')  # Split the input string into lines
    bounding_boxes = []

    for line in lines:
        line = line.strip()  # Remove leading/trailing whitespace

        if line:
            data = line.split()

            # Extract relevant values for each object
            obj_type = data[0]
            #print(obj_type)
            truncated = float(data[1])
            occluded = int(data[2])
            alpha = float(data[3])
            bbox = [float(val) for val in data[4:8]]
            dimensions = [float(val) for val in data[8:11]]
            location = [float(val) for val in data[11:14]]
            distance = float(data[13])
            rotation_y = float(data[14])

            if obj_type in object_class:  # Check if obj_type is in the desired classes
                # Append bounding box dimensions and distance to the list
                bounding_boxes.append((bbox, distance))

                # Print the 3D bounding box information
                print("Object Type:", obj_type)
                print("Truncated:", truncated)
                print("Occluded:", occluded)
                print("Alpha:", alpha)
                print("Bounding Box:", bbox)
                print("Dimensions:", dimensions)
                print("Location:", location)
                print("True Distance:", distance)
                print("Rotation Y:", rotation_y)
                print("------------------------")
    return bounding_boxes


def display_ground_truth(frame, bounding_boxes, show_output=True):
    frame_copy = frame.copy()
    for bbox, distance in bounding_boxes:
        # Extract bounding box coordinates
        x1, y1, x2, y2 = map(int, bbox)
        center_x = (x1 + x2) // 2

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    if show_output:
        # Convert frame from BGR to RGB
        image_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # Display the image with bounding boxes
        cv2.imshow("Bounding Boxes", image_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


def pipeline(left_image, right_image, object_class):
    """
    Performs a pipeline of operations on stereo images to obtain a colored disparity map, RGB frame, and colored depth map.

    Input:
    - left_image: Left stereo image (RGB format)
    - right_image: Right stereo image (RGB format)
    - object_class: List of object classes of interest for bounding box retrieval

    Output:
    - disparity_map_colored: Colored disparity map (RGB format)
    - frame_rgb: RGB frame
    - depth_map_colored: Colored depth map (RGB format)
    """
    global focal_length

    # Calculate the disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                      matcher="stereo_sgbm", show_disparity=False)

    # Calculate the depth map
    depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

    # Get bounding box coordinates for specified object classes
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class, show_output=False)

    # Calculate colored disparity map, RGB frame, and colored depth map
    disparity_map_colored, frame_rgb, depth_map_colored = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map, show_output=False)

    return disparity_map_colored, frame_rgb, depth_map_colored


def process_pipeline_images(left_image_folder, right_image_folder, output_folder_distance, object_class=['car', 'bicycle']):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        disparity_map_colored, frame_rgb, depth_map_colored = pipeline(left_image, right_image, object_class)

        # Construct the output file path
        output_file = os.path.join(output_folder_distance, image_file)

        # Save the disparity map as an image
        cv2.imwrite(output_file, disparity_map_colored)

        print(f"Disparity map saved: {output_file}")

def frames_to_video(frame_folder, output_folder, output_filename):
    """
    Converts a sequence of frames in a folder into an MP4 video and saves it in the specified output folder.

    Args:
        frame_folder (str): Path to the folder containing the frames.
        output_folder (str): Path to the output folder where the video will be saved.
        output_filename (str): Name of the output video file (including the .mp4 extension).
    """

    # Get the list of frame filenames in the folder
    frame_filenames = sorted(os.listdir(frame_folder))

    # Read the first frame to get frame size information
    first_frame_path = os.path.join(frame_folder, frame_filenames[0])
    first_frame = cv2.imread(first_frame_path)
    height, width, channels = first_frame.shape

    # Define the video codec and create a VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    output_path = os.path.join(output_folder, output_filename)
    video_writer = cv2.VideoWriter(output_path, fourcc, 15, (width, height))

    # Iterate over the frame filenames and write each frame to the video
    for frame_filename in frame_filenames:
        frame_path = os.path.join(frame_folder, frame_filename)
        frame = cv2.imread(frame_path)
        video_writer.write(frame)

    # Release the VideoWriter and print completion message
    video_writer.release()
    print("Video saved successfully at:", output_path)



def concatenate_videos_vertical(video1_path, video2_path, output_path):
    # Read the input videos
    video1 = cv2.VideoCapture(video1_path)
    video2 = cv2.VideoCapture(video2_path)

    # Get video properties
    width = int(video1.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(video1.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = video1.get(cv2.CAP_PROP_FPS)

    # Create a VideoWriter object to save the concatenated video
    output_video = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height * 2))

    while True:
        # Read frames from both videos
        ret1, frame1 = video1.read()
        ret2, frame2 = video2.read()

        if not ret1 or not ret2:
            # Either video1 or video2 has ended
            break

        # Resize the frames to match the width of the output video
        frame1_resized = cv2.resize(frame1, (width, height))
        frame2_resized = cv2.resize(frame2, (width, height))

        # Concatenate the frames vertically
        concat_frame = cv2.vconcat([frame1_resized, frame2_resized])

        # Write the concatenated frame to the output video
        output_video.write(concat_frame)

    # Release resources
    video1.release()
    video2.release()
    output_video.release()

    print("Concatenation complete. Output video saved:", output_path)












if __name__ == '__main__':

    # Get the current directory
    current_directory = os.getcwd()

    # Go back to the parent directory
    parent_directory = os.path.dirname(current_directory)

    # Set input directory
    left_image_folder = os.path.join(parent_directory, 'Data', 'Left', 'image_2')
    right_image_folder = os.path.join(parent_directory, 'Data', 'Right', 'image_3')
    labels_folder = os.path.join(parent_directory, 'Data', 'Labels', 'training')
    calibration_folder = os.path.join(parent_directory, 'Data', 'Callibration', 'training', 'calib')
    output_disparity_folder = os.path.join(parent_directory, 'Data', 'Output_Disparity_1')
    output_depth_folder = os.path.join(parent_directory, 'Data', 'Output_Depth_1')


    # Choose index of image
    index = 13

    # Display the image pair and get the left and right images
    left_image, right_image = display_image_pair(index=index, left_image_folder=left_image_folder, right_image_folder=right_image_folder, show_picture=True)
    print("\nImage shape: ", left_image.shape)

    # Compute disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5, matcher="stereo_sgbm", show_disparity=True) #stereo_bm   #stereo_sgbm
    print(disparity_map)
    print("\nDisparity map shape: ", disparity_map.shape)

    # Display the text file contents
    calibration_file = display_text_file(index, calibration_folder)
    label_file = display_text_file(index, labels_folder)

    # Extract calibration parameters
    p_left, p_right, p_ro_rect, p_velo_to_cam, p_imu_to_velo = get_calibration_parameters(calibration_file)
    print("\nLeft P Matrix")
    print(p_left)
    print("\nRight P Matrix")
    print(p_right)
    print("\nRO to Rect Matrix")
    print(p_ro_rect)
    print("\nVelodyne to Camera Matrix")
    print(p_velo_to_cam)
    print("\nIMU to Velodyne Matrix")
    print(p_imu_to_velo)

    # Decompose the projection matrix
    camera_matrix_left, rotation_matrix_left, translation_vector_left = decompose_projection_matrix(p_left)
    camera_matrix_right, rotation_matrix_right, translation_vector_right = decompose_projection_matrix(p_right)

    # Print the decomposed components
    print("\nCamera Matrix Left:")
    print(camera_matrix_left)
    print("\nRotation Matrix Left:")
    print(rotation_matrix_left)
    print("\nTranslation Vector Left:")
    print(translation_vector_left)

    print("\nCamera Matrix Right:")
    print(camera_matrix_right)
    print("\nRotation Matrix Right:")
    print(rotation_matrix_right)
    print("\nTranslation Vector Right:")
    print(translation_vector_right)

    # Extract the focal length and baseline
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])

    # Print the extracted values
    print("\nFocal Length (x-direction):", focal_length_x)
    print("\nFocal Length (y-direction):", focal_length_y)
    print("\nBaseline:", baseline, "\n")

    depth_map = calculate_depth_map(disparity=disparity_map, baseline=baseline, focal_length=focal_length_x, show_depth_map=True)
    print("Depth Map: ", depth_map)
    print("\nDepth map shape: ",depth_map.shape)
    print("\nDepth map distance: ", depth_map[200,600], " m")

    #save_disparity_maps(left_image_folder, right_image_folder, output_disparity_folder)

    #save_depth_maps(left_image_folder, right_image_folder, output_depth_folder, baseline, focal_length_x)

    # Instantiate model
    weights_path = os.path.join(parent_directory, 'Weights', 'yolov8m.pt')
    model = YOLO(weights_path)
    names = model.names
    print(names)

    # Process the frame to get bounding box centers
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class=['car', 'bicycle'], show_output=True)

    # Calculate distance from camera to objects
    disparity_map_distance, frame_distance, depth_map_distance = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map,show_output=True)

    ###-------COMPARE WITH GROUND TRUTH

    # Get data from labels.txt
    bounding_boxes_ground_truth = ground_truth_bbox(label_file, object_class=['Car', 'Cyclist', 'Pedestrian'])

    # Display ground truth image with distance
    display_ground_truth(left_image, bounding_boxes_ground_truth, show_output=True)





    ###-------------PIPELINE ON SINGLE IMAGE
    focal_length = focal_length_x
    index = 7

    # Load the left image
    left_image_path = os.path.join(left_image_folder, os.listdir(left_image_folder)[index])
    left_image = cv2.cvtColor(cv2.imread(left_image_path), cv2.COLOR_BGR2RGB)
    print(left_image)

    # Load the right image
    right_image_path = os.path.join(right_image_folder, os.listdir(right_image_folder)[index])
    right_image = cv2.cvtColor(cv2.imread(right_image_path), cv2.COLOR_BGR2RGB)


    disparity_map_colored, frame_rgb, depth_map_colored = pipeline(left_image, right_image, object_class=['car', 'bicycle'])

    # # Show the output frame with bounding boxes
    # frame_rgb = cv2.cvtColor(frame_rgb, cv2.COLOR_BGR2RGB)
    # cv2.imshow("Output disparity map", disparity_map_colored)
    # cv2.imshow("Output frame", frame_rgb)
    # cv2.imshow("Output depth map", depth_map_colored)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()





    ### ------------ PIPELINE ON FOLDER ----------------------------- ###
    # Set input and output directory
    left_image_folder = os.path.join(parent_directory, 'Data', 'Left_4')
    right_image_folder = os.path.join(parent_directory, 'Data', 'Right_4')
    output_folder_distance = os.path.join(parent_directory, 'Data', 'Output_Disparity_Distance_4')
    output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video')

    # ##--- Process all images in folder and save disparity map
    # process_pipeline_images(left_image_folder, right_image_folder, output_folder_distance, object_class=['car', 'bicycle', 'person'])
    #
    # ### ---- Convert frames to video
    # frames_to_video(output_folder_distance, output_video_folder, 'output_disparity_4.mp4')
    # frames_to_video(left_image_folder, output_video_folder, 'output_4.mp4')


    ### - Concat videos
    # video_1 = output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'output_1.mp4')
    # video_2 = output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'output_disparity_1.mp4')
    # output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'concat_1.mp4')
    # concatenate_videos_vertical(video_1, video_2, output_video_folder)
    #
    #
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • 314
  • 315
  • 316
  • 317
  • 318
  • 319
  • 320
  • 321
  • 322
  • 323
  • 324
  • 325
  • 326
  • 327
  • 328
  • 329
  • 330
  • 331
  • 332
  • 333
  • 334
  • 335
  • 336
  • 337
  • 338
  • 339
  • 340
  • 341
  • 342
  • 343
  • 344
  • 345
  • 346
  • 347
  • 348
  • 349
  • 350
  • 351
  • 352
  • 353
  • 354
  • 355
  • 356
  • 357
  • 358
  • 359
  • 360
  • 361
  • 362
  • 363
  • 364
  • 365
  • 366
  • 367
  • 368
  • 369
  • 370
  • 371
  • 372
  • 373
  • 374
  • 375
  • 376
  • 377
  • 378
  • 379
  • 380
  • 381
  • 382
  • 383
  • 384
  • 385
  • 386
  • 387
  • 388
  • 389
  • 390
  • 391
  • 392
  • 393
  • 394
  • 395
  • 396
  • 397
  • 398
  • 399
  • 400
  • 401
  • 402
  • 403
  • 404
  • 405
  • 406
  • 407
  • 408
  • 409
  • 410
  • 411
  • 412
  • 413
  • 414
  • 415
  • 416
  • 417
  • 418
  • 419
  • 420
  • 421
  • 422
  • 423
  • 424
  • 425
  • 426
  • 427
  • 428
  • 429
  • 430
  • 431
  • 432
  • 433
  • 434
  • 435
  • 436
  • 437
  • 438
  • 439
  • 440
  • 441
  • 442
  • 443
  • 444
  • 445
  • 446
  • 447
  • 448
  • 449
  • 450
  • 451
  • 452
  • 453
  • 454
  • 455
  • 456
  • 457
  • 458
  • 459
  • 460
  • 461
  • 462
  • 463
  • 464
  • 465
  • 466
  • 467
  • 468
  • 469
  • 470
  • 471
  • 472
  • 473
  • 474
  • 475
  • 476
  • 477
  • 478
  • 479
  • 480
  • 481
  • 482
  • 483
  • 484
  • 485
  • 486
  • 487
  • 488
  • 489
  • 490
  • 491
  • 492
  • 493
  • 494
  • 495
  • 496
  • 497
  • 498
  • 499
  • 500
  • 501
  • 502
  • 503
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • 512
  • 513
  • 514
  • 515
  • 516
  • 517
  • 518
  • 519
  • 520
  • 521
  • 522
  • 523
  • 524
  • 525
  • 526
  • 527
  • 528
  • 529
  • 530
  • 531
  • 532
  • 533
  • 534
  • 535
  • 536
  • 537
  • 538
  • 539
  • 540
  • 541
  • 542
  • 543
  • 544
  • 545
  • 546
  • 547
  • 548
  • 549
  • 550
  • 551
  • 552
  • 553
  • 554
  • 555
  • 556
  • 557
  • 558
  • 559
  • 560
  • 561
  • 562
  • 563
  • 564
  • 565
  • 566
  • 567
  • 568
  • 569
  • 570
  • 571
  • 572
  • 573
  • 574
  • 575
  • 576
  • 577
  • 578
  • 579
  • 580
  • 581
  • 582
  • 583
  • 584
  • 585
  • 586
  • 587
  • 588
  • 589
  • 590
  • 591
  • 592
  • 593
  • 594
  • 595
  • 596
  • 597
  • 598
  • 599
  • 600
  • 601
  • 602
  • 603
  • 604
  • 605
  • 606
  • 607
  • 608
  • 609
  • 610
  • 611
  • 612
  • 613
  • 614
  • 615
  • 616
  • 617
  • 618
  • 619
  • 620
  • 621
  • 622
  • 623
  • 624
  • 625
  • 626
  • 627
  • 628
  • 629
  • 630
  • 631
  • 632
  • 633
  • 634
  • 635
  • 636
  • 637
  • 638
  • 639
  • 640
  • 641
  • 642
  • 643
  • 644
  • 645
  • 646
  • 647
  • 648
  • 649
  • 650
  • 651
  • 652
  • 653
  • 654
  • 655
  • 656
  • 657
  • 658
  • 659
  • 660
  • 661
  • 662
  • 663
  • 664
  • 665
  • 666
  • 667
  • 668
  • 669
  • 670
  • 671
  • 672
  • 673
  • 674
  • 675
  • 676
  • 677
  • 678
  • 679
  • 680
  • 681
  • 682
  • 683
  • 684
  • 685
  • 686
  • 687
  • 688
  • 689
  • 690
  • 691
  • 692
  • 693
  • 694
  • 695
  • 696
  • 697
  • 698
  • 699
  • 700
  • 701
  • 702
  • 703
  • 704
  • 705
  • 706
  • 707
  • 708
  • 709
  • 710
  • 711
  • 712
  • 713
  • 714
  • 715
  • 716
  • 717
  • 718
  • 719
  • 720
  • 721
  • 722
  • 723
  • 724
  • 725
  • 726
  • 727
  • 728
  • 729
  • 730
  • 731
  • 732
  • 733
  • 734
  • 735
  • 736
  • 737
  • 738
  • 739
  • 740
  • 741
  • 742
  • 743
  • 744
  • 745
  • 746
  • 747
  • 748
  • 749
  • 750
  • 751
  • 752
  • 753
  • 754
  • 755
  • 756
  • 757
  • 758
  • 759
  • 760
  • 761
  • 762
  • 763
  • 764
  • 765
  • 766
  • 767
  • 768
  • 769
  • 770
  • 771
  • 772
  • 773
  • 774
  • 775
  • 776
  • 777
  • 778
  • 779
  • 780
  • 781
  • 782
  • 783
  • 784
  • 785
  • 786
  • 787
  • 788
  • 789
  • 790
  • 791
  • 792
  • 793
文章知识点与官方知识档案匹配,可进一步学习相关知识
Python入门技能树 首页 概览 462153 人正在系统学习中

举报

选择你想要举报的内容(必选)
  • 内容涉黄
  • 政治相关
  • 内容抄袭
  • 涉嫌广告
  • 内容侵权
  • 侮辱谩骂
  • 样式问题
  • 其他