Sythetic data recording for BBox3D

Hi, there is BBox 3D in Synthetic Data Visualizer, but I can’t get BBox3D output directly through Synthetic Data Recorder at the moment, is that right? Will this function be added to the extension in the future?
If I want to get it, is it possible through python script?

I try to modify the code of the syntheticdata_recorder extension to implement saving the image of the 3D boundingbox from the GUI as well as the npy file, and saving the camera parameters.

It looks not bad.

2 Likes

Here’s the codes I modified, if that helps. There may be some errors and the original file needs to be backed up first.

syntheticdata_recorder.py (40.6 KB)
visualization.py (6.3 KB)
writer.py (11.4 KB)

Hi xuxu,

Awesome! Well done!
Looks really good.

Thanks for the code.

Kindly,
Liila

Hello, I am still trying to call the get_pose interface, but this is not going well. when it is calling

        for sensor in gt_sensors:
            if sensor not in ["camera", "pose"]:
                if sensor == "instanceSegmentation":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_mapping=True)
                elif sensor == "boundingBox3D":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_corners=True)
                else:
                    gt[sensor] = self.sensor_helpers[sensor](viewport)
                current_sensor = self.sensor_helper_lib.create_or_retrieve_sensor(viewport, self.sensor_types[sensor])
                current_sensor_state = self.sd_interface.is_sensor_initialized(current_sensor)
                sensor_state[sensor] = current_sensor_state
            else:
                gt[sensor] = self.sensor_helpers[sensor](viewport)

in syntheticdata.py, it returns error [Error] [carb.events.python] TypeError: get_pose() takes 1 positional argument but 2 were given
because it is defined as

    def get_pose(self):
        """Get pose of all objects with a semantic label.
        """
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[0]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = UsdGeom.Xformable(prim).ComputeLocalToWorldTransform(0.0)
            pose.append((str(prim_path), m[1], str(m[2]), np.array(prim_tf)))
        return pose

in the same file.
This parameter does not seem to be used, but when I remove it, an error about c++ will be returned. Do you know how to fix it?

[Error] [carb.events.python] ArgumentError: Python argument types in
    None.GetPrimAtPath(Stage, numpy.int32)
did not match C++ signature:
    GetPrimAtPath(pxrInternal_v0_20__pxrReserved__::UsdStage {lvalue}, pxrInternal_v0_20__pxrReserved__::SdfPath path)

At:
  /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/exts/omni.isaac.synthetic_utils/omni/isaac/synthetic_utils/scripts/syntheticdata.py(135): get_pose

Hi @xuxu , Can you try with the updated get_pose() below?

    def get_pose(self, viewport=None):
        """Get pose of all objects with a semantic label.
        """
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[1]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = omni.usd.get_world_transform_matrix(prim)
            pose.append((str(prim_path), m[2], str(m[3]), np.array(prim_tf)))
        return pose

Cool! It works. It just throws an error only when recording data for the first time, but it seems to have no effect.

Viewport  :  ['rgb', 'boundingBox3D', 'camera', 'pose']
2021-07-25 10:02:43 [148,065ms] [Error] [carb.python] [py stderr]: /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)

I can get a 4x4 transfer matrix for each object on the entire stage (though I thought it was the object in the current perspective). After a few tries I learned that this transfer matrix is between the object coordinate system and the world coordinate system. (also, the transfer matrix obtained from 3DBox)

 ['/Warehouse/SM_CardBoxA_02/SM_CardBoxA_02', 2, 'CardBox',
        array([[ 3.37917009e-01,  9.41176037e-01,  0.00000000e+00, 0.00000000e+00],
               [-9.41176037e-01,  3.37917009e-01,  0.00000000e+00, 0.00000000e+00],
               [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00, 0.00000000e+00],
               [-9.21907471e+02,  1.04652783e+03,  1.29935669e+02, 1.00000000e+00]])                                ]],
      dtype=object)

Now I want to get the point cloud data (i.e. xyz coordinate data of the points) from the saved rgb image data and depth image data. As far as I know I need the Intrinsic parameters of the camera (fx, fy, cx, cy) and then I can calculate it. But how can I get these parameters? Or is there any other way to get these point cloud data?
Thanks a lot.

I get the answer from [A few snippets which might be useful to compute camera intrinsics]

# compute focal point and center
focal_x = height * focal_length / vert_aperture
focal_y = width * focal_length / horiz_aperture
center_x = height * 0.5
center_y = width * 0.5

Now I can get the xyz coordinates of the points, if I’m right, these are in the camera coordinate system? So I need to transform them into the world coordinate system (using the camera’s transformation matrix relative to the world coordinate system) and then I can match the coordinates I got for the 3D bbox above.