FoundationPose estimation with custom object

The pose estimation explained in the isaac_ros_foundationpose quickstart has two main steps:

  1. It detects the object on the image and creates a mask using isaac_ros_rtdetr.
  2. This mask is used by FoundationPose to start iterating on the pose estimation.
  3. A final pose estimation is provided by FoundationPose.

AFAIK, the models used in step 1 are only valid for objects that fall under certain categories, e.g. see SyntheticaDETR or YCB. Also, the API indicates that the pose estimation node is subscribed to the /segmentation topic, which must be published by the object detection nodes.

With the previous in mind, I’d like if someone could clarify the following questions:

  • How can I use isaac_ros_foundationpose on a novel, custom object that does not fall into any category of the DetectNet, RT-DETR or YOLOv8 object detection models?
  • Is it possible to exclusively use CAD data without any retraining for this custom, novel object? As stated in the documentation: FoundationPose is designed to perform pose estimation on previously unseen objects without model retraining.
1 Like

Hi,

Yes a 2D object detection model has to be trained for the 3D object detection to work with FoundationPose. isaac_ros_foundationpose expects a segmentation mask as one of the inputs. In our tutorials we use synthetica_detr to 2D object detection. And convert that into a segmentation box using nvidia::isaac_ros::foundationpose::Detection2DToMask

  1. You will have to train a 2D object detection model. Someone else from our team can get back to you on if/ how to do that with Isaac ROS.
  2. The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.
1 Like

We don’t have any direct instructions on training new models within Isaac ROS - instead, we defer to the instructions from TAO or other teams within NVIDIA. Once you have a trained model, we have tutorials that let you use it in most of our pipelines
Here’s an example for DOPE:
https://nvidia-isaac-ros.github.io/concepts/pose_estimation/dope/tutorial_custom_model.html

1 Like

Hello @ashwinvk,

Thank you very much for the clarification.

  1. The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.

I think that should be way explicitly stated in at least the Isaac ROS Pose Estimation overview. I had checked quite some resources but was not able to get a clear statement on weather retraining was needed.

Here’s an example for DOPE

Thanks! Should I look into TAO to train on objects that don’t fall into the DOPE categories?