Pose CNN Decoder Inference

Hello, I’m trying to do 3D Object Pose Estimation using Pose CNN Decoder on a custom object.
I’ve used the Traffic Cone prefab already available in the Isaac Sim Unity3D package.

So first of all I’ve generated the .etlt model for object detection using TLT and following the instructions, and I can succesfully run inference using the DetectNetv2 package with good results.

Then I’ve generated the .uff model for the pose estimation always following the instructions in the documentation.

Finally I’ve edited the .json files of PoseCnnDecoder packages to include both models and replacing the labels (default “dolly”) with the one I need. When I run the inference using a custom scene from Unity3D it doesn’t work at all. For example in the same scene used with detectnetv2 the 2D bounding boxes are correctly detected, meanwhile using PoseCnnDecoder it doesn’t even recognize the 2D detection.

What could be the problem? Thanks in advance

Ok, if it helps anybody, my solution was changing the “min_bbox_area” value in detection_pose_estimation_cnn_inference_dolly.config.json from the default 10000 to 100.

That is right. The min_bbox_area configuration is used to filter out the detections based on area depending on the use case. And the config parameter is set for dolly that is much larger than the cone. So reducing it will prevent the detections fro being filtered. Also confidence_threshold is another parameter that can be reduced to overcome the problem of false negatives (ofc ourse the downside is possibly more false positives. so this needs to be tuned according to the use-case and situation as well)

1 Like