Sythetic data recording for BBox3D

Hi, there is BBox 3D in Synthetic Data Visualizer, but I can’t get BBox3D output directly through Synthetic Data Recorder at the moment, is that right? Will this function be added to the extension in the future?
If I want to get it, is it possible through python script?

I try to modify the code of the syntheticdata_recorder extension to implement saving the image of the 3D boundingbox from the GUI as well as the npy file, and saving the camera parameters.

It looks not bad.

2 Likes

Here’s the codes I modified, if that helps. There may be some errors and the original file needs to be backed up first.

syntheticdata_recorder.py (40.6 KB)
visualization.py (6.3 KB)
writer.py (11.4 KB)

Hi xuxu,

Awesome! Well done!
Looks really good.

Thanks for the code.

Kindly,
Liila

Hello, I am still trying to call the get_pose interface, but this is not going well. when it is calling

        for sensor in gt_sensors:
            if sensor not in ["camera", "pose"]:
                if sensor == "instanceSegmentation":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_mapping=True)
                elif sensor == "boundingBox3D":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_corners=True)
                else:
                    gt[sensor] = self.sensor_helpers[sensor](viewport)
                current_sensor = self.sensor_helper_lib.create_or_retrieve_sensor(viewport, self.sensor_types[sensor])
                current_sensor_state = self.sd_interface.is_sensor_initialized(current_sensor)
                sensor_state[sensor] = current_sensor_state
            else:
                gt[sensor] = self.sensor_helpers[sensor](viewport)

in syntheticdata.py, it returns error [Error] [carb.events.python] TypeError: get_pose() takes 1 positional argument but 2 were given
because it is defined as

    def get_pose(self):
        """Get pose of all objects with a semantic label.
        """
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[0]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = UsdGeom.Xformable(prim).ComputeLocalToWorldTransform(0.0)
            pose.append((str(prim_path), m[1], str(m[2]), np.array(prim_tf)))
        return pose

in the same file.
This parameter does not seem to be used, but when I remove it, an error about c++ will be returned. Do you know how to fix it?

[Error] [carb.events.python] ArgumentError: Python argument types in
    None.GetPrimAtPath(Stage, numpy.int32)
did not match C++ signature:
    GetPrimAtPath(pxrInternal_v0_20__pxrReserved__::UsdStage {lvalue}, pxrInternal_v0_20__pxrReserved__::SdfPath path)

At:
  /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/exts/omni.isaac.synthetic_utils/omni/isaac/synthetic_utils/scripts/syntheticdata.py(135): get_pose

Hi @xuxu , Can you try with the updated get_pose() below?

    def get_pose(self, viewport=None):
        """Get pose of all objects with a semantic label.
        """
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[1]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = omni.usd.get_world_transform_matrix(prim)
            pose.append((str(prim_path), m[2], str(m[3]), np.array(prim_tf)))
        return pose

Cool! It works. It just throws an error only when recording data for the first time, but it seems to have no effect.

Viewport  :  ['rgb', 'boundingBox3D', 'camera', 'pose']
2021-07-25 10:02:43 [148,065ms] [Error] [carb.python] [py stderr]: /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)

I can get a 4x4 transfer matrix for each object on the entire stage (though I thought it was the object in the current perspective). After a few tries I learned that this transfer matrix is between the object coordinate system and the world coordinate system. (also, the transfer matrix obtained from 3DBox)

 ['/Warehouse/SM_CardBoxA_02/SM_CardBoxA_02', 2, 'CardBox',
        array([[ 3.37917009e-01,  9.41176037e-01,  0.00000000e+00, 0.00000000e+00],
               [-9.41176037e-01,  3.37917009e-01,  0.00000000e+00, 0.00000000e+00],
               [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00, 0.00000000e+00],
               [-9.21907471e+02,  1.04652783e+03,  1.29935669e+02, 1.00000000e+00]])                                ]],
      dtype=object)

Now I want to get the point cloud data (i.e. xyz coordinate data of the points) from the saved rgb image data and depth image data. As far as I know I need the Intrinsic parameters of the camera (fx, fy, cx, cy) and then I can calculate it. But how can I get these parameters? Or is there any other way to get these point cloud data?
Thanks a lot.

I get the answer from [A few snippets which might be useful to compute camera intrinsics]

# compute focal point and center
focal_x = height * focal_length / vert_aperture
focal_y = width * focal_length / horiz_aperture
center_x = height * 0.5
center_y = width * 0.5

Now I can get the xyz coordinates of the points, if I’m right, these are in the camera coordinate system? So I need to transform them into the world coordinate system (using the camera’s transformation matrix relative to the world coordinate system) and then I can match the coordinates I got for the 3D bbox above.

1 Like

@xuxu Once you have the camera intrinsics and extrinsics (camera’s transformation matrix relative to the world coordinate system), you can use them to get xyz pointcloud in world coordinates from depth image.

The 3D bbox also outputs xyz in world coordinates

1 Like

Thank you for your reply! I am currently trying to use Isaacsim to make a dataset like SUN RGBD for Votenet. It is not easy at the beginning. A small question is that when returning the camera parameters, only ‘horizontal_aperture’ is returned, not ‘vert_aperture’. But I can read this parameter in Isaacsim to calculate focal_x and focal_y. (Sometimes they are not equal)

@xuxu I have seen the issue sometimes when the ratio of horizontal_aperture and vert_aperture is not equal to resolution_width and resolution_height as the horizontal_aperture and vert_aperture USD attributes are pre-filled with default values.

OPTION 1

Usually, that gets fixed once I set a camera as the active camera for that viewport.

import omni
viewport_handle_1 = omni.kit.viewport.get_viewport_interface().get_instance("Viewport")
viewport_window_1 = omni.kit.viewport.get_viewport_interface().get_viewport_window(viewport_handle_1)
viewport_window_1.set_active_camera("/World/Camera")

After you set the active camera using the above script and then read USD attributes, the ratio of horizontal_aperture and vert_aperture will be equal to resolution_width and resolution_height

OPTION 2

Calculate vertical_aperature as follows vert_aperture = height/width * horiz_aperture

1 Like

Hi, there may be a bug when using get_bounding_box_3d, maybe you could try if you can reproduce it? Specifically, if there is an object in the current view that is out of viewport, then an error is returned.


If no object is beyond the view limit, there is no problem. As shown in the picture. Thanks.


@xuxu I created a simple script that can be run in the script editor and it dumps out 3D bounding box output. I am not able to repro your issue with this script and a simple map (attached herewith).
bbox_test.usd (19.9 KB)
get_bounding_box_3d.py (2.1 KB)