Project pointcloud to 2d images using [Object Based Synthetic Dataset Generation] documentation

Hi,

I’ve been following the Isaac Sim Replicator tutorial to generate my dataset, and I’m currently working on projecting my point cloud onto the RGB image.

I tried two methods, but only Method 1 works for me:

Method1:

# Transpose is needed for the row-column-major conversion
cam_view_transform = data[camera_params_annot_name]["cameraViewTransform"].reshape((4, 4))
cam_projection_transform = data[camera_params_annot_name]["cameraProjection"].reshape((4, 4))

# Transform to world space
world_homogeneous = np.dot(poincloud, local_to_world_tf) # pointcloud is Nx4
# Transform to camera space
camera_homogeneous = np.dot(world_homogeneous, cam_view_transform)
# Projection transformation
clip_space = np.dot(camera_homogeneous, projection_matrix)
# Normalize Device Coordinates (NDC)
ndc = clip_space / clip_space[: , 3]
# Map NDC to screen space
x_s = (ndc[: , 0] + 1) * screen_width / 2
y_s = (1 - ndc[: , 1]) * screen_height / 2

Method 2:

# Transpose is needed for the row-column-major conversion
cam_view_transform = data[camera_params_annot_name]["cameraViewTransform"].reshape((4, 4))
cam_projection_transform = data[camera_params_annot_name]["cameraProjection"].reshape((4, 4))

# Object world space to camera frame transform 
obj_to_camera_tf = camera_view_transform @ local_to_world_tf
# Transform to camera space
camera_homogeneous = np.dot(pointcloud, obj_to_camera)
# Projection transformation
clip_space = np.dot(camera_homogeneous, projection_matrix)
# Normalize Device Coordinates (NDC)
ndc = clip_space / clip_space[: , 3]
# Map NDC to screen space
x_s = (ndc[: , 0] + 1) * screen_width / 2
y_s = (1 - ndc[: , 1]) * screen_height / 2

In the tutorials, obj_to_camera_tf is used for training, but I noticed it doesn’t work for direct projection. Do we always need to use the workaround (transform to world coordinates → transform to camera coordinates → apply the projection matrix) to correctly project or visualize point clouds? Is there a reason why Method 2 doesn’t work as expected?

see here for possible related discussion/solution: