Nurec Data Conversion

Summary

This text will be hidden

The video shows the effect of Nurec scene reconstruction.The image is a verification of the point cloud to image 3D mapping process. The point cloud to image mapping effect is acceptable during the verification process, but why do vehicles in the video have motion blur and wheels disappear?
Could you please help me resolve this confusion? From what aspects should I verify and solve this problem?
My thinking is as follows: Because the vehicle travels at a low speed, the issue of image point cloud asynchrony may not need to be considered.

Thank you for your post. I will reach out to a NuRec expert and have them get back to you.

It could be related to rolling-shutter camera.

  1. How is frame_timestamps_us populated for each camera in their NCore converter — one global timestamp repeated across cameras, or per-camera per-frame [exposure_start, exposure_end] with rolling-shutter correction?
  2. What shutter type / shutter_delay_us do they record for each camera?
  3. Are dynamic-vehicle cuboid tracks present in the NCore sequence? What accuracy / source (human GT vs. detector)?
  4. Is BBox3.centroid geometric center or bottom-center in their source data?
  5. Pose graph density — are intermediate ego/actor poses denser than one per image frame (i.e., < LiDAR sweep duration)?
  6. What training config + tracks_calib option did they use?

refer to ncore/tools/data_converter/pai/converter.py at main · NVIDIA/ncore · GitHub
and ncore/tools/data_converter/waymo/converter.py at main · NVIDIA/ncore · GitHub

  1. We are currently using the [exposure_start, exposure_end] mode, with shutter_delay_us set to 68ms.
  2. The current data only contains point cloud and image information, and does not include cuboids. Will adding cuboids improve the scene reproduction effect?
  3. The length of the vehicle trajectory data is the same as that of the lidar data.Below I will show some formats of pose, image, and lidar data:
sensor shape log

================pose info shape========================
poses: (143, 4, 4)
================camera info shape========================
frame_timestamps_us: [1776415177400000 1776415177468000]
frame_timestamps_us time: 1776415177400000
camera fame len: 143
read lidar middle
================lidar info shape========================
directions: (230400, 3)
distance: (1, 230400)
intensity: (1, 230400)
per_point_ts: (230400,)
model_element: (230400, 2)
lidar fame len: 143

Based on the suggestion, we removed lidar model supervision during scene reconstruction, which resulted in even worse scene reconstruction performance.(The left side shows the result of pure image scene reconstruction, and the right side shows the result of reconstruction including radar model.)


Below is the code for the camera storage process, as well as the visualization results after storage. Please point out any errors.

def store_camera_frame(dataset_root, poses_writer, store_writer,intrinsics_writer,l_calib,camera_data,timestamps_us,masks_writer):
    for camera_id, cam_frames in camera_data.items():
        #写相机内参
        masks_writer.store_camera_masks(
            camera_id=camera_id,
            mask_images={
                "ego": Image.new("L", (1920, 1080), 0)  # 创建一个全黑的 mask 图像
            },
        )
        # 获取相机标定表
        # print(print(FThetaCameraModelParameters.PolynomialType[0]))
        intrinsics_writer.store_camera_intrinsics(
            camera_id=camera_id,
            # camera_model_parameters=camera_model
            camera_model_parameters=OpenCVPinholeCameraModelParameters.from_dict(
                {"resolution": [1920, 1080], "shutter_type": "ROLLING_TOP_TO_BOTTOM",
                "principal_point": [960.312, 542.005], 
                "linear_cde": [0.0, 0.0, 0.0], "shutter_delay_us": 0, "focal_length": [1012.51, 1012.22],
                "radial_coeffs": [0, 0, 0.0, 0.0, 0.0, 0.0],
                "tangential_coeffs": [0, 0], "thin_prism_coeffs":[0.0, 0.0, 0.0, 0.0]})
        )

        #写相机外参  静态外参:相机 → rig
        static_pose = np.eye(4, dtype=np.float32)
        T_cam_lidar = npy_convert_lidar_model.get_cam_to_lidar(l_calib, camera_id)
        backlidar_rig =  np.array([
            [ 0.02433602, -0.99964417,  0.01104251,  0.022665  ],
            [ 0.99958967,  0.02452475,  0.00998521,  1.335997  ],
            [-0.01022758,  0.01129861,  0.9998839,   1.062775  ],
            [ 0,          0,          0,          1        ]
        ], dtype=np.float32)


        camera_rig = backlidar_rig @ T_cam_lidar
        camera_rig_rot = make_T(yaw_rotation(-90))@camera_rig
        translation_up =  [0, 0, 1]
        camera_rig_rot[:3, 3] += translation_up
        poses_writer.store_static_pose(
            source_frame_id=camera_id,
            target_frame_id="rig",
            pose= (camera_rig_rot)#l_camTorig,  T_cam_lidar_new
        )
 
        if cam_frames:
            camera_writer = store_writer.register_component_writer(
                CameraSensorComponent.Writer,
                component_instance_name=camera_id,
            )
            for idx, frame in cam_frames.items():
                img = frame["image"]
                ts_start = int(frame.get("timestamp_us", timestamps_us[idx] if timestamps_us is not None else time.time()*1e6))
                if idx < len(timestamps_us) - 1:
                    ts_end = ts_start + 68000 #int(timestamps_us[idx + 1])
                else:
                    ts_end = ts_start
                img_copy = jibian_fun(img)
      
                if isinstance(img_copy, np.ndarray):
                    jpeg_bytes = imageio.imwrite("<bytes>", img_copy, format="jpeg", quality=92)
                else:
                    jpeg_bytes = img_copy

                camera_writer.store_frame(
                    image_binary_data=jpeg_bytes,
                    image_format="jpeg",
                    frame_timestamps_us=np.array([ts_start, ts_end], dtype=np.uint64),
                    generic_data={},
                    generic_meta_data={},
                )
                
                if idx == 0:
                    print("================camera info shape========================")
                    l_showtime = np.array([ts_start, ts_end], dtype=np.uint64)
                    print("frame_timestamps_us:",l_showtime)
                    print("frame_timestamps_us time:",ts_start)
                    print("camera fame len:",len(cam_frames))

“does not include cuboids. Will adding cuboids improve the scene reproduction effect”
→ Please try to add cuboids

“shutter_delay_us”: 0
→ From ncore/tools/data_converter/pai/converter.py at main · NVIDIA/ncore · GitHub , PAI converter is read from camera parameters and use it.