Pose CNN Decoder Visualization

Hello, I’m trying to use the Pose CNN Decoder package training a custom object (the traffic cone prefab already provided in Isaac Sim Unity3D). I successfully created both model.etlt for object detection and model.uff for pose estimation and I’m running the inference on a custom scene created with Unity3D.

The problem is in the 3D bounding boxes visualization, where I noticed that when I move the object towards the x-axis, the 3Dbb moves towards the z-axis etc. Looks like the pose estimation orientation is wrong.

Any advice on what could be the problem and how to solve it?

Can someone give me any suggestion? I have also tried to retrain both models for object detection and pose estimation using the dolly prefab and keeping all configuration files as default. When i run inference I get this results:


What am I missing?

The mismatch in movements that you observe it because of the different coordinate systems. The coordinate system of Unity where you are moving the object is different from Isaac SDK’s robot coordinate system. Also, the pose output from the decoder itself in a Isaac’s camera coordinate system that is different from Isaac’s robot coordinate system.
So X is forward direction in Unity which is along Z axis in Isaac’s coordinate system that you might have observed on Isaac’s side with visualization of bounding boxes.
Please take a look at this documentation section on Isaac’s coordinate systems regarding this: https://docs.nvidia.com/isaac/isaac/packages/perception/doc/coord_frame.html
More information on Isaac Unity’s coordinate system is provided here: https://docs.nvidia.com/isaac/isaac/doc/simulation/unity3d.html

For dolly, for example, the mesh is imported from Unity, so we give a rotation transformation accordingly to align with this change in coordinate systems in detection_pose_estimation_dolly.config.json in WebSight.

What prefab are you using for dolly? Can you share the path?
Also, can you try training dolly with provided binary to begin with and then do inference on custom scene to verify that it works?

Hello and thanks for the reply.

For dolly I’m using the one located at ~/isaac_sim_unity3d/packages/Nvidia/Samples/Warehouse/Props/IndustrialDolly.prefab set in the PoseEstimationObjectsGroup.

I currently can’t run the provided binary for dolly training since my hardware limitations (gtx 980) don’t allow me to open the factory_of_the_future scene, so I use the pose_cnn_decoder_training scene.

Regarding the coordinate systems, ok for the mesh orientation changing the rotation trasformation. But for the 3D pose estimation since running inference with my model and the provided one without changing anything in config files gives different results, should I change something in the orientation of the object during training with pose_cnn_decoder_training?

Any suggestion? Could anyone train Pose CNN Decoder using a custom prefab from source scene file and resolve this problem in orientation?

I managed to get access to a more powerful machine. Training dolly with the provided binary and doing inference on custom scene is working, so that probably depends on the scene used for training. Could you provide the source file from which you generate the factory_of_the_future so I can change the dolly object with the one I need and make it work? Thanks


I have tried using the dolly prefab you are using with the pose estimation cnn training scene. One modification I suggest to the scene is in Class Label Manger, change the index value from 2 to 1 as the architecture trains for segmentation mask with values in range 0 to 1. Screenshot of the ClassLabelManager post this change below.

But with that change, I have trained it on the same scene. I recommend reducing the learning rate to 1e-4 or so after 15000 iterations so that the losses reduce better.
I then tested for inference by running the same scene by adding dolly prefab into the scene (un-check Scenario manager and drag dolly prefab into the scene at the same location. Change the layer to “Ignore Raycast” for the IndustrialDolly Game Object). I have attached the screenshot of how the scene looks with this changes.

And the inference works for me without any orientation issues…there might be larger errors at some orientations, but you can fine-tune your training accordingly, change the scene settings to improve it. But overall if I run the same training scene with camera poses from the same setup, I see that the inference works as expected. (In procedural_camera and procedural Gameobjects in the scene, increase the frame interval to 30 or so so that the camera and objects around don’t change every frame and allows you to visually check.)
I am attaching the sample inference images from this exercise.

So I advise to run the same steps, first do inference on the same set as training to check that the model performs well there. There is nothing special about the factory_of_the_future binary provided, it is more tailored for camera poses when attached to a robot, so it fixes the camera height, angles in that scene and you won’t have the full 360 degree views in training samples. Also, the scene is set to ignore the wheels as decoder output so that the model performs better under wheel randomization…Apart from that, the underlying settings is the same, so custom scene should work as well.

Thanks for the reply, I’ll try right now following your steps and I’ll let you know.
Just another question. If I wanted to detect the pose changing the orientation of the object during training what should I do? In the detect_net training scene there was the object group in procedural where we can modify the script (for max place, max object count ecc…) where I changed the rotation of the spawned object from Quaternion.Identity to a random quaternion. In the pose_cnn_decoder_training scene there isn’t this object group in procedural, so what should I modify to make the object spawn in random orientation?

Yes, we can’t place it in the procedural for this scene as we can’t yet send the pose of the object with this scenario object setting. Instead we keep the object constant and move the camera around to capture all relative poses…try playing around with CameraGroup parameters so that object is not always at the center of the image by giving some deviation to the target object position.
Link to the video: https://drive.google.com/file/d/194qsZqkEEZgxUa7Ld2bLZt1ykt7HQ4ji/view?usp=sharing
But if you want to change object orientation too, you can do so without using ScenarioObject like I set up the scene above for Inference. Drag the IndustrialDolly into the scene and add TrasnformRandomizer and Randomness and Randomizer scripts to that object along with Rigid Body script that is currently attached to Scenario Object. I have attached the video below. Follow the steps till Saving scene in the video and hit play, you should see the object changing orientation in place.

Currently the scene available for download is instead setup for ease of use to make the users avoid the steps through Scenario Object.


Thanks for the video explanation for the object orientation, I can succesfully generate the scene, but when I start a training procedure I get the message “waiting for samples: 0” as the object set in the scene is not seen in Isaac. The scene configuration is equal to the one on your video, should I change something in isaac files?

Regarding the problems on training results, I followed the steps you suggested.
I set up the training scene like you said

Then I started training for 20000 steps (I don’t need optimal results for now).

For the inference I set up the scene like this

So uncheck Scenario manager, drag dolly with Ignore Raycast layer ecc.

Running inference on this scene gives me the same problem stated before when setting up the object for random orientation (waiting for samples: 0), so I directly run inference on a built custom scene, but results are still completely wrong.


Here are some loss from tensorboard

If I run inference using the provided model everything is fine.
posecnn.zip (9.6 KB)
Finally I upload the json files for training in inference (should be unchanged from original), maybe the problem is here.

Thanks again and sorry for bothering but I can’t figure out what is the problem

Ok regarding the object orientation was my bad, that one is solved and now I can train with object spawning at random orientation.

But still nothing regarding inference resutls.

The drop in losses during training doesn’t look right. I have tested from everything on release branch just by changing the label value to 1 from 2 and it looks fine. Here is how the losses are supposed to come down approximately (I reduced the learning rate at 15000 iterations to 1e-4).

Sample encoder, decoder ground truth cropped images and decoder segmentation mask output below:

Can you try with this app files in training folder?
training_app_dolly_changes.zip (3.6 KB)
…no major changes, it just visualizes the ground truth pose coming from sim as well. Increase the frame interval for procedural_camera and procedural to 10 or so and check that the 3D pose visualized as 3D bounding box and the image change at the same time. This is to ensure that the right inputs are being sent for training. From your losses, it looks like the training is not going as expected for you.