Sythetic data recording for BBox3D

Hi, there is BBox 3D in Synthetic Data Visualizer, but I can’t get BBox3D output directly through Synthetic Data Recorder at the moment, is that right? Will this function be added to the extension in the future?
If I want to get it, is it possible through python script?

I try to modify the code of the syntheticdata_recorder extension to implement saving the image of the 3D boundingbox from the GUI as well as the npy file, and saving the camera parameters.

It looks not bad.


Here’s the codes I modified, if that helps. There may be some errors and the original file needs to be backed up first. (40.6 KB) (6.3 KB) (11.4 KB)

Hi xuxu,

Awesome! Well done!
Looks really good.

Thanks for the code.


Hello, I am still trying to call the get_pose interface, but this is not going well. when it is calling

        for sensor in gt_sensors:
            if sensor not in ["camera", "pose"]:
                if sensor == "instanceSegmentation":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_mapping=True)
                elif sensor == "boundingBox3D":
                    gt[sensor] = self.sensor_helpers[sensor](viewport, parsed=True, return_corners=True)
                    gt[sensor] = self.sensor_helpers[sensor](viewport)
                current_sensor = self.sensor_helper_lib.create_or_retrieve_sensor(viewport, self.sensor_types[sensor])
                current_sensor_state = self.sd_interface.is_sensor_initialized(current_sensor)
                sensor_state[sensor] = current_sensor_state
                gt[sensor] = self.sensor_helpers[sensor](viewport)

in, it returns error [Error] [] TypeError: get_pose() takes 1 positional argument but 2 were given
because it is defined as

    def get_pose(self):
        """Get pose of all objects with a semantic label.
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[0]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = UsdGeom.Xformable(prim).ComputeLocalToWorldTransform(0.0)
            pose.append((str(prim_path), m[1], str(m[2]), np.array(prim_tf)))
        return pose

in the same file.
This parameter does not seem to be used, but when I remove it, an error about c++ will be returned. Do you know how to fix it?

[Error] [] ArgumentError: Python argument types in
    None.GetPrimAtPath(Stage, numpy.int32)
did not match C++ signature:
    GetPrimAtPath(pxrInternal_v0_20__pxrReserved__::UsdStage {lvalue}, pxrInternal_v0_20__pxrReserved__::SdfPath path)

  /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/exts/omni.isaac.synthetic_utils/omni/isaac/synthetic_utils/scripts/ get_pose

Hi @xuxu , Can you try with the updated get_pose() below?

    def get_pose(self, viewport=None):
        """Get pose of all objects with a semantic label.
        stage = omni.usd.get_context().get_stage()
        mappings = self.generic_helper_lib.get_instance_mappings()
        pose = []
        for m in mappings:
            prim_path = m[1]
            prim = stage.GetPrimAtPath(prim_path)
            prim_tf = omni.usd.get_world_transform_matrix(prim)
            pose.append((str(prim_path), m[2], str(m[3]), np.array(prim_tf)))
        return pose

Cool! It works. It just throws an error only when recording data for the first time, but it seems to have no effect.

Viewport  :  ['rgb', 'boundingBox3D', 'camera', 'pose']
2021-07-25 10:02:43 [148,065ms] [Error] [carb.python] [py stderr]: /home/ubuntu/.local/share/ov/pkg/isaac_sim-2021.1.1/kit/extscore/omni.kit.pip_archive/pip_prebundle/numpy/core/ VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)

I can get a 4x4 transfer matrix for each object on the entire stage (though I thought it was the object in the current perspective). After a few tries I learned that this transfer matrix is between the object coordinate system and the world coordinate system. (also, the transfer matrix obtained from 3DBox)

 ['/Warehouse/SM_CardBoxA_02/SM_CardBoxA_02', 2, 'CardBox',
        array([[ 3.37917009e-01,  9.41176037e-01,  0.00000000e+00, 0.00000000e+00],
               [-9.41176037e-01,  3.37917009e-01,  0.00000000e+00, 0.00000000e+00],
               [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00, 0.00000000e+00],
               [-9.21907471e+02,  1.04652783e+03,  1.29935669e+02, 1.00000000e+00]])                                ]],

Now I want to get the point cloud data (i.e. xyz coordinate data of the points) from the saved rgb image data and depth image data. As far as I know I need the Intrinsic parameters of the camera (fx, fy, cx, cy) and then I can calculate it. But how can I get these parameters? Or is there any other way to get these point cloud data?
Thanks a lot.

I get the answer from [A few snippets which might be useful to compute camera intrinsics]

# compute focal point and center
focal_x = height * focal_length / vert_aperture
focal_y = width * focal_length / horiz_aperture
center_x = height * 0.5
center_y = width * 0.5

Now I can get the xyz coordinates of the points, if I’m right, these are in the camera coordinate system? So I need to transform them into the world coordinate system (using the camera’s transformation matrix relative to the world coordinate system) and then I can match the coordinates I got for the 3D bbox above.