Poor/unknown quality of Pose Estimation Model

I’ve followed the steps outlined in the article on 3D Object Pose Estimation with Pose CNN Decoder
https://docs.nvidia.com/isaac/isaac/packages/object_pose_estimation/doc/pose_cnn_decoder.html
While not without hiccups, I managed to train both object detection model and 3D Object Pose Estimation CNN Decoder models.
Both models are trained on data from Isaac Unity - object detection model architecture is ResNet10 and it performs reasonably well, see the screenshots



The problem starts when I run a pose estimation pipeline - the decoder output looks normal in some instances(not all of them), but
a) It doesn’t clearly say in documentation how to specify the 3D bounding box size at zero orientation and the transformation from the object center to the bounding box center. I made a few attempts to guess the correct values and enter them in corresponding Isaac Sight widget, but that didn’t result in 3D bounding boxes making any sense still
b) I’m under impression that 3D bounding box visualizations and decoder output lag behind - meaning that inference is not performed on current camera image received from simulation.
Let me demonstrate the above problems with screenshots.


These two screenshots are the example of what seems to be like a lag in inference/visualization.

And this image is where decoder input seems to be making sense, but 3D bounding boxes are completely out of place.
I cannot attach my .json config for 3D Pose estimation app, so I put it into pastebin

Any more information I can provide? I can share both models if needed. 3D pose estimation model is trained for 25000 steps.

Hi, your funny 3D bounding box comes from your config “object_T_box_center”: [1.0, 0, 0, 0, 0, 0, 0.375],
“box_dimensions”: [0.9, 1.32, 0.25], you have to change it to the bucket dimensions and center. As it says in the end of the documentation, the bounding box needs to be centered and scaled.

There are two types of visualization for the estimated pose:

  • A 3D bounding box, which requires specification of both the 3D bounding box size at zero orientation and the transformation from the object center to the bounding box center. Configure these parameters in the viewers/Detections3Viewer component in the inference application files.
    *A rendering of the CAD model in the scene, which requires the path to the object CAD model and file names. These correspond to the assetroot and assets parameters respectively in the websight component in the inference application files.

and for your delay problem the solution is:

If you notice lag in mesh versus image rendering in Sight, increase the Change delay value slightly to synchronize the image, rendered mesh, and bounding box. To find this option, right-click the rendering window in Sight.

1 Like