【小白深度教程 1.30】手把手教你使用 Open3D（13）使用双目视觉实现目标检测和测距（Yolov8，Python 代码）

本文链接： https://blog.csdn.net/2401_87064292/article/details/142212548

【小白深度教程 1.30】手把手教你使用 Open3D（13）使用双目视觉实现目标检测和测距（ Yolov8 ，Python 代码）

在这里插入图片描述

1. 摘要

这篇文章中，我们尝试使用双目视觉实现 逐像素深度估算 ，并通过结合 YOLOv8 实现目标检测和距离测量。通过使用 块匹配 和 半全局块匹配（SGBM） 算法，我们展示了如何利用立体计算机视觉技术来准确确定深度信息。块匹配算法高效地在立体相机图像之间建立 对应关系 ，而SGBM算法优化了 视差图 估计过程。

算法流程包括：

我们接收所有左右的图像
计算视差
创建深度图
运行 YOLOv8 目标检测
显示距离及其边界框。

def pipeline(left_image, right_image, object_class):
    """
    Performs a pipeline of operations on stereo images to obtain a colored disparity map, RGB frame, and colored depth map.

    Input:
    - left_image: Left stereo image (RGB format)
    - right_image: Right stereo image (RGB format)
    - object_class: List of object classes of interest for bounding box retrieval

    Output:
    - disparity_map_colored: Colored disparity map (RGB format)
    - frame_rgb: RGB frame
    - depth_map_colored: Colored depth map (RGB format)
    """
    global focal_length

    # Calculate the disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                      matcher="stereo_sgbm", show_disparity=False)

    # Calculate the depth map
    depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

    # Get bounding box coordinates for specified object classes
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class, show_output=False)

    # Calculate colored disparity map, RGB frame, and colored depth map
    disparity_map_colored, frame_rgb, depth_map_colored = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map, show_output=False)

    return disparity_map_colored, frame_rgb, depth_map_colored
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

2. 数据集

本项目将使用 KITTI数据集，特别是立体相机图像。数据集包含全面的 3D目标检测基准，包括 7,481 张训练图像和 7,518 张测试图像。这些图像配有相应的点云，共提供 80,256 个标注对象用于分析。此外，项目推理阶段还将使用KITTI数据集的 同步和校正后的原始数据 ，进一步提高结果的准确性和可靠性。丰富的数据集使得对所提出方法的深入探索和评估成为可能，并确保项目结果的稳健性。

3. 简单立体视觉的应用

现在我们将尝试使用KITTI数据集找到物体的距离。为此，我们首先需要了解自动驾驶汽车上摄像头的配置。幸运的是，我们有一个非常详细的解释，描述了摄像头之间的位置和基线，如下所示。

下载用于 3D目标检测的数据集时，我们有以下文件：

左侧图像
右侧图像
校准数据
标签

4.1 提取投影矩阵

下面是校准数据的示例。由于我们使用的是左右彩色图像，因此需要提取 P2 和 P3 以获取投影矩阵。

P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03
P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03
R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01
Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01
1
2
3
4
5
6
7

注意，我们得到了左投影矩阵和右投影矩阵：

左侧投影矩阵
[[     721.54           0      609.56      44.857]
 [          0      721.54      172.85     0.21638]
 [          0           0           1   0.0027459]]

右侧投影矩阵
[[     721.54           0      609.56     -339.52]
 [          0      721.54      172.85      2.1999]
 [          0           0           1   0.0027299]]

RO到校正矩阵
[[    0.99992   0.0098378   -0.007445]
 [ -0.0098698     0.99994  -0.0042785]
 [  0.0074025   0.0043516     0.99996]]

激光雷达到相机矩阵
[[  0.0075337    -0.99997  -0.0006166  -0.0040698]
 [   0.014802  0.00072807    -0.99989   -0.076316]
 [    0.99986   0.0075238    0.014808    -0.27178]]

IMU到激光雷达矩阵
[[          1  0.00075531  -0.0020358    -0.80868]
 [ -0.0007854     0.99989   -0.014823     0.31956]
 [  0.0020244    0.014825     0.99989    -0.79972]]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

接下来，我们需要从我们的投影矩阵中提取内参矩阵和外参矩阵。回顾以下公式：

在这里插入图片描述
我们将使用 OpenCV 的 decomposeProjectionMatrix 函数：

def decompose_projection_matrix(projection_matrix):
    """
    将投影矩阵分解为相机矩阵、旋转矩阵和平移向量。

    参数：
        projection_matrix (numpy.ndarray): 3x4投影矩阵。

    返回：
        tuple: 包含分解后的组件的元组：
            - camera_matrix (numpy.ndarray): 相机矩阵。 [ fx   0   cx ]
                                                           [  0  fy   cy ]
                                                           [  0   0    1 ]
            - rotation_matrix (numpy.ndarray): 旋转矩阵。
            - translation_vector (numpy.ndarray): 平移向量。
    """

    # 分解投影矩阵
    camera_matrix, rotation_matrix, translation_vector, _, _, _, _ = cv2.decomposeProjectionMatrix(projection_matrix)
    translation_vector = translation_vector/translation_vector[3]
    # 返回分解后的组件
    return camera_matrix, rotation_matrix, translation_vector

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

输出如下：

    # 提取焦距和基线
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])
1
2
3
4

可以观察到旋转矩阵是一个单位矩阵。我们需要从内参矩阵中提取焦距，并从平移向量计算基线，如下所示：

    # 提取焦距和基线
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])
1
2
3
4

输出：

焦距（x方向）：721.5377

焦距（y方向）：721.5377

基线： [    0.53271] 

1
2
3
4
5
6

注意基线是0.53271米，与自动驾驶汽车的配置相同。

4.2 提取标签

我们为每对图像获取一个标签文件。下面是我们的label.txt文件，每个对象有15个值。

Car 0.00 0 -1.56 564.62 174.59 616.43 224.74 1.61 1.66 3.20 -0.69 1.69 25.01 -1.59
Car 0.00 0 1.71 481.59 180.09 512.55 202.42 1.40 1.51 3.70 -7.43 1.88 47.55 1.55
Car 0.00 0 1.64 542.05 175.55 565.27 193.79 1.46 1.66 4.05 -4.71 1.71 60.52 1.56
Cyclist 0.00 0 1.89 330.60 176.09 355.61 213.60 1.72 0.50 1.95 -12.63 1.88 34.09 1.54
DontCare -1 -1 -10 753.33 164.32 798.00 186.74 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 738.50 171.32 753.27 184.42 -1 -1 -1 -1000 -1000 -1000 -10
1
2
3
4
5
6

#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

注意，DontCare 标签表示未标注物体的区域，例如因为它们距离激光扫描仪太远。我们需要从标签中提取边界框坐标和每个物体的距离。我们将与地面真值进行比较。

def ground_truth_bbox(labels_file, object_class):
    lines = labels_file.split('\n')  # 将输入字符串拆分为行
    bounding_boxes = []

    for line in lines:
        line = line.strip()  # 删除首尾空格

        if line:
            data = line.split()

            # 提取每个对象的相关值
            obj_type = data[0]
            truncated = float(data[1])
            occluded = int(data[2])
            alpha = float(data[3])
            bbox = [float(val) for val in data[4:8]]
            dimensions = [float(val) for val in data[8:11]]
            location = [float(val) for val in data[11:14]]
            distance = float(data[13])
            rotation_y = float(data[14])

            if obj_type in object_class:  # 检查obj_type是否在目标类中
                # 将边界框尺寸和距离添加到列表中
                bounding_boxes.append((bbox, distance))

                # 打印3D边界框信息
                print("物体类型:", obj_type)
                print("截断:", truncated)
                print("遮挡:", occluded)
                print("Alpha:", alpha)
                print("边界框:", bbox)
                print("尺寸:", dimensions)
                print("位置:", location)
                print("真实距离:", distance)
                print("Y轴旋转:", rotation_y)
                print("------------------------")
    return bounding_boxes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

输出：

物体类型: Car
截断: 0.0
遮挡: 0
Alpha: 1.71
边界框: [481.59, 180.09, 512.55, 202.42]
尺寸: [1.4, 1.51, 3.7]
位置: [-7.43, 1.88, 47.55]
真实距离: 47.55 
Y轴旋转: 1.55
1
2
3
4
5
6
7
8
9

4.3 显示左右图像

我们首先显示KITTI数据集的左右图像。注意，由于基线为54 cm，相较于OAK-D立体相机，这个基线较大，因此会有更大的视差，从而更准确地计算深度图。

在这里插入图片描述

4.4 计算视差图

接下来，我们将使用OpenCV的块匹配函数（StereoBM_create）和半全局块匹配函数（StereoSGBM_create）来计算视差。我们需要对一些参数进行微调：num_disparities、block_size和window_size。

def compute_disparity(left_img, right_img, num_disparities=6 * 16, block_size=11, window_size=6, matcher="stereo_sgbm", show_disparity=True):
    """
    计算给定立体图像对的视差图。

    参数：
        image (numpy.ndarray): 左立体图像。
        img_pair (numpy.ndarray): 右立体图像。
        num_disparities (int): 最大视差减去最小视差。
        block_size (int): 块窗口的大小，必须为奇数。
        window_size (int): 视差平滑窗口的大小。
        matcher (str): 使用的匹配算法("stereo_bm" 或 "stereo_sgbm")。
        show_disparity (bool): 是否使用matplotlib显示视差图。

    返回：
        numpy.ndarray: 计算的视差图。
    """
    if matcher == "stereo_bm":
        # 创建立体BM匹配器
        matcher = cv2.StereoBM_create(numDisparities=num_disparities, blockSize=block_size)
    elif matcher == "stereo_sgbm":
        # 创建立体SGBM匹配器
        matcher = cv2.StereoSGBM_create(
            minDisparity=0, numDisparities=num_disparities, blockSize=block_size,
            P1=8 * 3 * window_size ** 2, P2=32 * 3 * window_size ** 2,
            mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
        )

    # 将图像转换为灰度
    left_gray = cv2.cvtColor(left_img, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_img, cv2.COLOR_BGR2GRAY)

    # 计算视差图
    disparity = matcher.compute(left_gray, right_gray).astype(np.float32) / 16


    if show_disparity:
        # 使用matplotlib显示视差图
        plt.imshow(disparity, cmap="CMRmap_r") #CMRmap_r # cividis
        plt.title('SGBM视差图', fontsize=12)
        plt.show()

    return disparity
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

观察使用块匹配的视差图比SGBM更嘈杂。从现在开始，我们将使用SGBM算法。

在这里插入图片描述
注意我们在图像的左侧有一条白色条带，因为在左图像与右图像之间没有可以匹配的特征。这个现象取决于block_size参数。

4.5 计算深度图

接下来，我们将使用视差图来输出深度图。我们将使用以下公式：

在这里插入图片描述

def calculate_depth_map(disparity, baseline, focal_length, show_depth_map=True):
    """
    从给定的视差图、基线和焦距计算深度图。

    参数：
        disparity (numpy.ndarray): 视差图。
        baseline (float): 摄像头之间的基线。
        focal_length (float): 相机的焦距。

    返回：
        numpy.ndarray: 深度图。
    """

    # 用一个较小的最小值替换视差中的所有0和-1（以避免除以0或负数）
    disparity[disparity == 0] = 0.1
    disparity[disparity == -1] = 0.1

    # 初始化深度图以匹配视差图的大小
    depth_map = np.ones(disparity.shape, np.single)

    # 计算深度
    depth_map[:] = focal_length * baseline / disparity[:]

    if show_depth_map:
        # 使用matplotlib显示深度图
        plt.imshow(depth_map, cmap="cividis") #CMRmap_r # cividis
        plt.show()

    return depth_map
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

输出是一个显示每个像素深度的深度图。

深度图代表逐像素深度估算。然而，我们感兴趣的是获取某些特定障碍物（如汽车或行人）的深度或距离。因此，我们将使用物体检测算法来分割深度图，并仅获取这些物体的距离。

4.6 目标检测

下一步是使用 YOLOv8 检测帧中的物体。我们需要获取边界框的坐标。

def get_bounding_box_center_frame(frame, model, names, object_class, show_output=True):

    bbox_coordinates = []
    frame_copy = frame.copy()

    # 使用指定模型对输入帧执行物体检测
    results = model(frame)

    # 遍历物体检测的结果
    for result in results:

        # 遍历结果中检测到的每个边界框
        for r in result.boxes.data.tolist():
            # 从边界框中提取坐标、分数和类别ID
            x1, y1, x2, y2, score, class_id = r
            x1 = int(x1)
            x2 = int(x2)
            y1 = int(y1)
            y2 = int(y2)

            # 根据类别ID获取类别名称
            class_name = names.get(class_id)


            # 检查类别名称是否与指定的object_class匹配，并且检测分数高于阈值
            if class_name in object_class  and score > 0.5:
                bbox_coordinates.append([x1, y1, x2, y2])

                # 在帧上绘制边界框
                cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)


    if show_output:
        # 将帧从BGR转换为RGB
        frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # 显示带有边界框的输出帧
        cv2.imshow("Output", frame_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


    # 返回中心坐标列表
    return bbox_coordinates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

我们决定只检测汽车、人和自行车。

在这里插入图片描述

4.7 结合物体检测计算距离

由于我们现在有了每帧的边界框坐标和深度图，我们可以使用这两部分数据提取每个障碍物的深度。

我们通过使用边界框中心的深度图索引来获取障碍物的距离。

def calculate_distance(bbox_coordinates, frame, depth_map, disparity_map, show_output=True):
    frame_copy = frame.copy()

    # Normalize the disparity map to [0, 255]
    disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the disparity map
    colormap = cv2.COLORMAP_JET
    disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

    # Normalize the depth map to [0, 255]
    depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the depth map
    colormap = cv2.COLORMAP_BONE
    depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

    for bbox_coor in bbox_coordinates:
        x1, y1, x2, y2 = bbox_coor
        center_x = (x1 + x2) // 2
        center_y = (y1 + y2) // 2

        distance = depth_map[center_y][center_x]
        print("Calculated distance:", distance)

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(disparity_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(depth_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(disparity_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(disparity_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(depth_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(depth_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)



    if show_output:
        # Convert frame from BGR to RGB
        #frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)

        # Show the output frame with bounding boxes
        cv2.imshow("Output disparity map", disparity_map_colored)
        cv2.imshow("Output frame", frame_copy)
        cv2.imshow("Output depth map", depth_map_colored)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    return disparity_map_colored, frame_copy, depth_map_colored
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

我们绘制边界框，物体到摄像机的距离以米为单位显示在顶部。

在这里插入图片描述

完整代码：

import numpy as np
import cv2
import matplotlib.pyplot as plt
from ultralytics import YOLO
import os


def display_image_pair(index, left_image_folder, right_image_folder, show_picture=True):
    """
    Load and display a pair of left and right images from the specified folders.

    Args:
        index (int): Index of the image pair to display.
        left_image_folder (str): Path to the folder containing the left images.
        right_image_folder (str): Path to the folder containing the right images.
        show_picture (bool): Whether to display the image pair using matplotlib.
    """
    # Load the left image
    left_image_path = os.path.join(left_image_folder, os.listdir(left_image_folder)[index])
    left_image = cv2.cvtColor(cv2.imread(left_image_path), cv2.COLOR_BGR2RGB)

    # Load the right image
    right_image_path = os.path.join(right_image_folder, os.listdir(right_image_folder)[index])
    right_image = cv2.cvtColor(cv2.imread(right_image_path), cv2.COLOR_BGR2RGB)

    if show_picture:
        # Create subplots and display the image pair
        f, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
        ax1.imshow(left_image)
        ax1.set_title('Image Left', fontsize=15)
        ax2.imshow(right_image)
        ax2.set_title('Image Right', fontsize=15)

        # Show the plot
        plt.show()

    # Return the left and right images
    return left_image, right_image



#--- Function to compute and display disparity
def compute_disparity(left_img, right_img, num_disparities=6 * 16, block_size=11, window_size=6, matcher="stereo_sgbm", show_disparity=True):
    """
    Compute the disparity map for a given stereo image pair.

    Args:
        image (numpy.ndarray): Left image of the stereo pair.
        img_pair (numpy.ndarray): Right image of the stereo pair.
        num_disparities (int): Maximum disparity minus minimum disparity.
        block_size (int): Size of the block window. It must be an odd number.
        window_size (int): Size of the disparity smoothness window.
        matcher (str): Matcher algorithm to use ("stereo_bm" or "stereo_sgbm").
        show_disparity (bool): Whether to display the disparity map using matplotlib.

    Returns:
        numpy.ndarray: The computed disparity map.
    """
    if matcher == "stereo_bm":
        # Create a Stereo BM matcher
        matcher = cv2.StereoBM_create(numDisparities=num_disparities, blockSize=block_size)
    elif matcher == "stereo_sgbm":
        # Create a Stereo SGBM matcher
        matcher = cv2.StereoSGBM_create(
            minDisparity=0, numDisparities=num_disparities, blockSize=block_size,
            P1=8 * 3 * window_size ** 2, P2=32 * 3 * window_size ** 2,
            mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
        )

    # Convert the images to grayscale
    left_gray = cv2.cvtColor(left_img, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_img, cv2.COLOR_BGR2GRAY)

    # Compute the disparity map
    disparity = matcher.compute(left_gray, right_gray).astype(np.float32) / 16


    if show_disparity:
        # Display the disparity map using matplotlib
        plt.imshow(disparity, cmap="CMRmap_r") #CMRmap_r # cividis
        plt.title('Disparity map with SGBM', fontsize=12)
        plt.show()

    return disparity

def display_text_file(index, folder_path):
    """
    Display the contents of a text file based on the specified index.

    Args:
        index (int): Index of the text file to display.
        folder_path (str): Path to the folder containing the text files.

    Returns:
        str: Contents of the text file.
    """
    # Get the list of text files in the folder
    txt_files = [f for f in os.listdir(folder_path) if f.endswith('.txt')]

    if index >= 0 and index < len(txt_files):
        # Get the file path of the text file based on the index
        file_path = os.path.join(folder_path, txt_files[index])

        # Open the file and read its contents
        with open(file_path, 'r') as file:
            contents = file.read()

        # Display the contents of the text file
        print(f"Text file at index {index}:\n{contents}")

        # Return the contents of the text file
        return contents
    else:
        print("Invalid index.")
        return None


#--- Function to retrieve callibration parameters
def get_calibration_parameters(file_contents):
    """
    Retrieve calibration parameters from the contents of a calibration file.

    Args:
        file_contents (str): Contents of the calibration file.

    Returns:
        tuple: Tuple containing the calibration parameters:
            - p_left (numpy.ndarray): Projection matrix for the left camera.
            - p_right (numpy.ndarray): Projection matrix for the right camera.
            - p_ro_rect (numpy.ndarray): Rectification matrix.
            - p_velo_to_cam (numpy.ndarray): Transformation matrix from velodyne to camera coordinates.
            - p_imu_to_velo (numpy.ndarray): Transformation matrix from IMU to velodyne coordinates.
    """
    fin = file_contents.split('\n')
    for line in fin:
        if line[:2] == 'P2':
            p_left = np.array(line[4:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:2] == 'P3':
            p_right = np.array(line[4:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:7] == 'R0_rect':
            p_ro_rect = np.array(line[9:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:14] == 'Tr_velo_to_cam':
            p_velo_to_cam = np.array(line[16:].strip().split(" ")).astype('float32').reshape(3, -1)
        elif line[:14] == 'Tr_imu_to_velo':
            p_imu_to_velo = np.array(line[16:].strip().split(" ")).astype('float32').reshape(3, -1)
    return p_left, p_right, p_ro_rect, p_velo_to_cam, p_imu_to_velo

def decompose_projection_matrix(projection_matrix):
    """
    Decompose a projection matrix into camera matrix, rotation matrix, and translation vector.

    Args:
        projection_matrix (numpy.ndarray): 3x4 projection matrix.

    Returns:
        tuple: Tuple containing the decomposed components:
            - camera_matrix (numpy.ndarray): Camera matrix. [ fx   0   cx ]
                                                            [  0  fy   cy ]
                                                            [  0   0    1 ]
            - rotation_matrix (numpy.ndarray): Rotation matrix.
            - translation_vector (numpy.ndarray): Translation vector.
    """

    # Decompose the projection matrix
    camera_matrix, rotation_matrix, translation_vector, _, _, _, _ = cv2.decomposeProjectionMatrix(projection_matrix)
    translation_vector = translation_vector/translation_vector[3]
    # Return the decomposed components
    return camera_matrix, rotation_matrix, translation_vector


def calculate_depth_map(disparity, baseline, focal_length, show_depth_map=True):
    """
    Calculates the depth map from a given disparity map, baseline, and focal length.

    Args:
        disparity (numpy.ndarray): Disparity map.
        baseline (float): Baseline between the cameras.
        focal_length (float): Focal length of the camera.

    Returns:
        numpy.ndarray: Depth map.
    """

    # Replace all instances of 0 and -1 disparity with a small minimum value (to avoid div by 0 or negatives)
    disparity[disparity == 0] = 0.1
    disparity[disparity == -1] = 0.1

    # Initialize the depth map to match the size of the disparity map
    depth_map = np.ones(disparity.shape, np.single)

    # Calculate the depths
    depth_map[:] = focal_length * baseline / disparity[:]

    if show_depth_map:
        # Display the disparity map using matplotlib
        plt.imshow(depth_map, cmap="cividis") #CMRmap_r # cividis
        plt.title('Depth map', fontsize=12)
        plt.show()

    return depth_map




def save_disparity_maps(left_image_folder, right_image_folder, output_folder):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        # Calculate the disparity map
        disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                          matcher="stereo_sgbm", show_disparity=False)

        # Normalize the disparity map to [0, 255]
        disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

        # Apply a colorful colormap to the disparity map
        colormap = cv2.COLORMAP_JET
        disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

        # Construct the output file path
        output_file = os.path.join(output_folder, image_file)

        # Save the disparity map as an image
        cv2.imwrite(output_file, disparity_map_colored)

        print(f"Disparity map saved: {output_file}")



def save_depth_maps(left_image_folder, right_image_folder, output_folder, baseline, focal_length):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        # Calculate the disparity map
        disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                          matcher="stereo_sgbm", show_disparity=False)

        # Calculate the depth map
        depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

        # # Normalize the depth map to [0, 255]
        # depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        #
        # # Apply a colorful colormap to the depth map
        # colormap = cv2.COLORMAP_BONE
        # depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

        # Construct the output file path
        output_file = os.path.join(output_folder, image_file)

        # Save the depth map as an image
        cv2.imwrite(output_file, depth_map)

        print(f"Depth map saved: {output_file}")




def get_bounding_box_center_frame(frame, model, names, object_class, show_output=True):

    bbox_coordinates = []
    frame_copy = frame.copy()

    # Perform object detection on the input frame using the specified model
    results = model(frame)

    # Iterate over the results of object detection
    for result in results:

        # Iterate over each bounding box detected in the result
        for r in result.boxes.data.tolist():
            # Extract the coordinates, score, and class ID from the bounding box
            x1, y1, x2, y2, score, class_id = r
            x1 = int(x1)
            x2 = int(x2)
            y1 = int(y1)
            y2 = int(y2)

            # Get the class name based on the class ID
            class_name = names.get(class_id)


            # Check if the class name matches the specified object_class and the detection score is above a threshold
            if class_name in object_class  and score > 0.5:
                bbox_coordinates.append([x1, y1, x2, y2])

                # Draw bounding box on the frame
                cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)


    if show_output:
        # Convert frame from BGR to RGB
        frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # Show the output frame with bounding boxes
        cv2.imshow("Output", frame_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


    # Return the list of center coordinates
    return bbox_coordinates


def calculate_distance(bbox_coordinates, frame, depth_map, disparity_map, show_output=True):
    frame_copy = frame.copy()

    # Normalize the disparity map to [0, 255]
    disparity_map_normalized = cv2.normalize(disparity_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the disparity map
    colormap = cv2.COLORMAP_JET
    disparity_map_colored = cv2.applyColorMap(disparity_map_normalized, colormap)

    # Normalize the depth map to [0, 255]
    depth_map_normalized = cv2.normalize(depth_map, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)

    # Apply a colorful colormap to the depth map
    colormap = cv2.COLORMAP_BONE
    depth_map_colored = cv2.applyColorMap(depth_map_normalized, colormap)

    for bbox_coor in bbox_coordinates:
        x1, y1, x2, y2 = bbox_coor
        center_x = (x1 + x2) // 2
        center_y = (y1 + y2) // 2

        distance = depth_map[center_y][center_x]
        print("Calculated distance:", distance)

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(disparity_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw bounding box on the frame
        cv2.rectangle(depth_map_colored, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(disparity_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(disparity_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)

        # Draw white rectangle behind the text
        cv2.rectangle(depth_map_colored, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(depth_map_colored, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2, cv2.LINE_AA)



    if show_output:
        # Convert frame from BGR to RGB
        #frame_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)

        # Show the output frame with bounding boxes
        cv2.imshow("Output disparity map", disparity_map_colored)
        cv2.imshow("Output frame", frame_copy)
        cv2.imshow("Output depth map", depth_map_colored)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    return disparity_map_colored, frame_copy, depth_map_colored




def ground_truth_bbox(labels_file, object_class):
    lines = labels_file.split('\n')  # Split the input string into lines
    bounding_boxes = []

    for line in lines:
        line = line.strip()  # Remove leading/trailing whitespace

        if line:
            data = line.split()

            # Extract relevant values for each object
            obj_type = data[0]
            #print(obj_type)
            truncated = float(data[1])
            occluded = int(data[2])
            alpha = float(data[3])
            bbox = [float(val) for val in data[4:8]]
            dimensions = [float(val) for val in data[8:11]]
            location = [float(val) for val in data[11:14]]
            distance = float(data[13])
            rotation_y = float(data[14])

            if obj_type in object_class:  # Check if obj_type is in the desired classes
                # Append bounding box dimensions and distance to the list
                bounding_boxes.append((bbox, distance))

                # Print the 3D bounding box information
                print("Object Type:", obj_type)
                print("Truncated:", truncated)
                print("Occluded:", occluded)
                print("Alpha:", alpha)
                print("Bounding Box:", bbox)
                print("Dimensions:", dimensions)
                print("Location:", location)
                print("True Distance:", distance)
                print("Rotation Y:", rotation_y)
                print("------------------------")
    return bounding_boxes


def display_ground_truth(frame, bounding_boxes, show_output=True):
    frame_copy = frame.copy()
    for bbox, distance in bounding_boxes:
        # Extract bounding box coordinates
        x1, y1, x2, y2 = map(int, bbox)
        center_x = (x1 + x2) // 2

        # Convert distance to string
        distance_str = f"{distance:.2f} m"

        # Draw bounding box on the frame
        cv2.rectangle(frame_copy, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Calculate the text size
        text_size, _ = cv2.getTextSize(distance_str, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)

        # Calculate the position for placing the text
        text_x = center_x - text_size[0] // 2
        text_y = y1 - 10  # Place the text slightly above the bounding box

        # Calculate the rectangle coordinates
        rect_x1 = text_x - 5
        rect_y1 = text_y - text_size[1] - 5
        rect_x2 = text_x + text_size[0] + 5
        rect_y2 = text_y + 5

        # Draw white rectangle behind the text
        cv2.rectangle(frame_copy, (rect_x1, rect_y1), (rect_x2, rect_y2), (255, 255, 255), cv2.FILLED)

        # Put text at the center of the bounding box
        cv2.putText(frame_copy, distance_str, (text_x, text_y),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    if show_output:
        # Convert frame from BGR to RGB
        image_rgb = cv2.cvtColor(frame_copy, cv2.COLOR_BGR2RGB)
        # Display the image with bounding boxes
        cv2.imshow("Bounding Boxes", image_rgb)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


def pipeline(left_image, right_image, object_class):
    """
    Performs a pipeline of operations on stereo images to obtain a colored disparity map, RGB frame, and colored depth map.

    Input:
    - left_image: Left stereo image (RGB format)
    - right_image: Right stereo image (RGB format)
    - object_class: List of object classes of interest for bounding box retrieval

    Output:
    - disparity_map_colored: Colored disparity map (RGB format)
    - frame_rgb: RGB frame
    - depth_map_colored: Colored depth map (RGB format)
    """
    global focal_length

    # Calculate the disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5,
                                      matcher="stereo_sgbm", show_disparity=False)

    # Calculate the depth map
    depth_map = calculate_depth_map(disparity_map, baseline, focal_length, show_depth_map=False)

    # Get bounding box coordinates for specified object classes
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class, show_output=False)

    # Calculate colored disparity map, RGB frame, and colored depth map
    disparity_map_colored, frame_rgb, depth_map_colored = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map, show_output=False)

    return disparity_map_colored, frame_rgb, depth_map_colored


def process_pipeline_images(left_image_folder, right_image_folder, output_folder_distance, object_class=['car', 'bicycle']):
    # Get the list of image files in the left image folder
    left_image_files = os.listdir(left_image_folder)

    # Iterate over the image files
    for image_file in left_image_files:
        # Construct the paths for the left and right images
        left_image_path = os.path.join(left_image_folder, image_file)
        right_image_path = os.path.join(right_image_folder, image_file)

        # Read the left and right images
        left_image = cv2.imread(left_image_path)
        right_image = cv2.imread(right_image_path)

        disparity_map_colored, frame_rgb, depth_map_colored = pipeline(left_image, right_image, object_class)

        # Construct the output file path
        output_file = os.path.join(output_folder_distance, image_file)

        # Save the disparity map as an image
        cv2.imwrite(output_file, disparity_map_colored)

        print(f"Disparity map saved: {output_file}")

def frames_to_video(frame_folder, output_folder, output_filename):
    """
    Converts a sequence of frames in a folder into an MP4 video and saves it in the specified output folder.

    Args:
        frame_folder (str): Path to the folder containing the frames.
        output_folder (str): Path to the output folder where the video will be saved.
        output_filename (str): Name of the output video file (including the .mp4 extension).
    """

    # Get the list of frame filenames in the folder
    frame_filenames = sorted(os.listdir(frame_folder))

    # Read the first frame to get frame size information
    first_frame_path = os.path.join(frame_folder, frame_filenames[0])
    first_frame = cv2.imread(first_frame_path)
    height, width, channels = first_frame.shape

    # Define the video codec and create a VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    output_path = os.path.join(output_folder, output_filename)
    video_writer = cv2.VideoWriter(output_path, fourcc, 15, (width, height))

    # Iterate over the frame filenames and write each frame to the video
    for frame_filename in frame_filenames:
        frame_path = os.path.join(frame_folder, frame_filename)
        frame = cv2.imread(frame_path)
        video_writer.write(frame)

    # Release the VideoWriter and print completion message
    video_writer.release()
    print("Video saved successfully at:", output_path)



def concatenate_videos_vertical(video1_path, video2_path, output_path):
    # Read the input videos
    video1 = cv2.VideoCapture(video1_path)
    video2 = cv2.VideoCapture(video2_path)

    # Get video properties
    width = int(video1.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(video1.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = video1.get(cv2.CAP_PROP_FPS)

    # Create a VideoWriter object to save the concatenated video
    output_video = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height * 2))

    while True:
        # Read frames from both videos
        ret1, frame1 = video1.read()
        ret2, frame2 = video2.read()

        if not ret1 or not ret2:
            # Either video1 or video2 has ended
            break

        # Resize the frames to match the width of the output video
        frame1_resized = cv2.resize(frame1, (width, height))
        frame2_resized = cv2.resize(frame2, (width, height))

        # Concatenate the frames vertically
        concat_frame = cv2.vconcat([frame1_resized, frame2_resized])

        # Write the concatenated frame to the output video
        output_video.write(concat_frame)

    # Release resources
    video1.release()
    video2.release()
    output_video.release()

    print("Concatenation complete. Output video saved:", output_path)












if __name__ == '__main__':

    # Get the current directory
    current_directory = os.getcwd()

    # Go back to the parent directory
    parent_directory = os.path.dirname(current_directory)

    # Set input directory
    left_image_folder = os.path.join(parent_directory, 'Data', 'Left', 'image_2')
    right_image_folder = os.path.join(parent_directory, 'Data', 'Right', 'image_3')
    labels_folder = os.path.join(parent_directory, 'Data', 'Labels', 'training')
    calibration_folder = os.path.join(parent_directory, 'Data', 'Callibration', 'training', 'calib')
    output_disparity_folder = os.path.join(parent_directory, 'Data', 'Output_Disparity_1')
    output_depth_folder = os.path.join(parent_directory, 'Data', 'Output_Depth_1')


    # Choose index of image
    index = 13

    # Display the image pair and get the left and right images
    left_image, right_image = display_image_pair(index=index, left_image_folder=left_image_folder, right_image_folder=right_image_folder, show_picture=True)
    print("\nImage shape: ", left_image.shape)

    # Compute disparity map
    disparity_map = compute_disparity(left_image, right_image, num_disparities=90, block_size=5, window_size=5, matcher="stereo_sgbm", show_disparity=True) #stereo_bm   #stereo_sgbm
    print(disparity_map)
    print("\nDisparity map shape: ", disparity_map.shape)

    # Display the text file contents
    calibration_file = display_text_file(index, calibration_folder)
    label_file = display_text_file(index, labels_folder)

    # Extract calibration parameters
    p_left, p_right, p_ro_rect, p_velo_to_cam, p_imu_to_velo = get_calibration_parameters(calibration_file)
    print("\nLeft P Matrix")
    print(p_left)
    print("\nRight P Matrix")
    print(p_right)
    print("\nRO to Rect Matrix")
    print(p_ro_rect)
    print("\nVelodyne to Camera Matrix")
    print(p_velo_to_cam)
    print("\nIMU to Velodyne Matrix")
    print(p_imu_to_velo)

    # Decompose the projection matrix
    camera_matrix_left, rotation_matrix_left, translation_vector_left = decompose_projection_matrix(p_left)
    camera_matrix_right, rotation_matrix_right, translation_vector_right = decompose_projection_matrix(p_right)

    # Print the decomposed components
    print("\nCamera Matrix Left:")
    print(camera_matrix_left)
    print("\nRotation Matrix Left:")
    print(rotation_matrix_left)
    print("\nTranslation Vector Left:")
    print(translation_vector_left)

    print("\nCamera Matrix Right:")
    print(camera_matrix_right)
    print("\nRotation Matrix Right:")
    print(rotation_matrix_right)
    print("\nTranslation Vector Right:")
    print(translation_vector_right)

    # Extract the focal length and baseline
    focal_length_x = camera_matrix_right[0, 0]
    focal_length_y = camera_matrix_right[1, 1]
    baseline = abs(translation_vector_left[0] - translation_vector_right[0])

    # Print the extracted values
    print("\nFocal Length (x-direction):", focal_length_x)
    print("\nFocal Length (y-direction):", focal_length_y)
    print("\nBaseline:", baseline, "\n")

    depth_map = calculate_depth_map(disparity=disparity_map, baseline=baseline, focal_length=focal_length_x, show_depth_map=True)
    print("Depth Map: ", depth_map)
    print("\nDepth map shape: ",depth_map.shape)
    print("\nDepth map distance: ", depth_map[200,600], " m")

    #save_disparity_maps(left_image_folder, right_image_folder, output_disparity_folder)

    #save_depth_maps(left_image_folder, right_image_folder, output_depth_folder, baseline, focal_length_x)

    # Instantiate model
    weights_path = os.path.join(parent_directory, 'Weights', 'yolov8m.pt')
    model = YOLO(weights_path)
    names = model.names
    print(names)

    # Process the frame to get bounding box centers
    bbox_coordinates = get_bounding_box_center_frame(left_image, model, names, object_class=['car', 'bicycle'], show_output=True)

    # Calculate distance from camera to objects
    disparity_map_distance, frame_distance, depth_map_distance = calculate_distance(bbox_coordinates, left_image, depth_map, disparity_map,show_output=True)

    ###-------COMPARE WITH GROUND TRUTH

    # Get data from labels.txt
    bounding_boxes_ground_truth = ground_truth_bbox(label_file, object_class=['Car', 'Cyclist', 'Pedestrian'])

    # Display ground truth image with distance
    display_ground_truth(left_image, bounding_boxes_ground_truth, show_output=True)





    ###-------------PIPELINE ON SINGLE IMAGE
    focal_length = focal_length_x
    index = 7

    # Load the left image
    left_image_path = os.path.join(left_image_folder, os.listdir(left_image_folder)[index])
    left_image = cv2.cvtColor(cv2.imread(left_image_path), cv2.COLOR_BGR2RGB)
    print(left_image)

    # Load the right image
    right_image_path = os.path.join(right_image_folder, os.listdir(right_image_folder)[index])
    right_image = cv2.cvtColor(cv2.imread(right_image_path), cv2.COLOR_BGR2RGB)


    disparity_map_colored, frame_rgb, depth_map_colored = pipeline(left_image, right_image, object_class=['car', 'bicycle'])

    # # Show the output frame with bounding boxes
    # frame_rgb = cv2.cvtColor(frame_rgb, cv2.COLOR_BGR2RGB)
    # cv2.imshow("Output disparity map", disparity_map_colored)
    # cv2.imshow("Output frame", frame_rgb)
    # cv2.imshow("Output depth map", depth_map_colored)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()





    ### ------------ PIPELINE ON FOLDER ----------------------------- ###
    # Set input and output directory
    left_image_folder = os.path.join(parent_directory, 'Data', 'Left_4')
    right_image_folder = os.path.join(parent_directory, 'Data', 'Right_4')
    output_folder_distance = os.path.join(parent_directory, 'Data', 'Output_Disparity_Distance_4')
    output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video')

    # ##--- Process all images in folder and save disparity map
    # process_pipeline_images(left_image_folder, right_image_folder, output_folder_distance, object_class=['car', 'bicycle', 'person'])
    #
    # ### ---- Convert frames to video
    # frames_to_video(output_folder_distance, output_video_folder, 'output_disparity_4.mp4')
    # frames_to_video(left_image_folder, output_video_folder, 'output_4.mp4')


    ### - Concat videos
    # video_1 = output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'output_1.mp4')
    # video_2 = output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'output_disparity_1.mp4')
    # output_video_folder = os.path.join(parent_directory, 'Data', 'Output_Video', 'concat_1.mp4')
    # concatenate_videos_vertical(video_1, video_2, output_video_folder)
    #
    #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793

文章知识点与官方知识档案匹配，可进一步学习相关知识

Python入门技能树首页概览 462153 人正在系统学习中

【小白深度教程 1.30】手把手教你使用 Open3D（13）使用双目视觉实现目标检测和测距（Yolov8，Python 代码）

【小白深度教程 1.30】手把手教你使用 Open3D（13）使用 双目视觉 实现目标检测和测距（ Yolov8 ，Python 代码）

1. 摘要

2. 数据集

3. 简单立体视觉的应用

4.1 提取投影矩阵

4.2 提取标签

4.3 显示左右图像

4.4 计算视差图

4.5 计算深度图

4.6 目标检测

4.7 结合物体检测计算距离

完整代码：

【小白深度教程 1.30】手把手教你使用 Open3D（13）使用双目视觉实现目标检测和测距（ Yolov8 ，Python 代码）