The pose estimation explained in the isaac_ros_foundationpose
quickstart has two main steps:
- It detects the object on the image and creates a mask using
isaac_ros_rtdetr
.
- This mask is used by
FoundationPose
to start iterating on the pose estimation.
- A final pose estimation is provided by
FoundationPose
.
AFAIK, the models used in step 1 are only valid for objects that fall under certain categories, e.g. see SyntheticaDETR or YCB. Also, the API indicates that the pose estimation node is subscribed to the /segmentation
topic, which must be published by the object detection nodes.
With the previous in mind, I’d like if someone could clarify the following questions:
- How can I use
isaac_ros_foundationpose
on a novel, custom object that does not fall into any category of the DetectNet
, RT-DETR
or YOLOv8
object detection models?
- Is it possible to exclusively use CAD data without any retraining for this custom, novel object? As stated in the documentation: FoundationPose is designed to perform pose estimation on previously unseen objects without model retraining.
1 Like
Hi,
Yes a 2D object detection model has to be trained for the 3D object detection to work with FoundationPose. isaac_ros_foundationpose expects a segmentation mask as one of the inputs. In our tutorials we use synthetica_detr to 2D object detection. And convert that into a segmentation box using nvidia::isaac_ros::foundationpose::Detection2DToMask
- You will have to train a 2D object detection model. Someone else from our team can get back to you on if/ how to do that with Isaac ROS.
- The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.
1 Like
We don’t have any direct instructions on training new models within Isaac ROS - instead, we defer to the instructions from TAO or other teams within NVIDIA. Once you have a trained model, we have tutorials that let you use it in most of our pipelines
Here’s an example for DOPE:
https://nvidia-isaac-ros.github.io/concepts/pose_estimation/dope/tutorial_custom_model.html
1 Like
Hello @ashwinvk,
Thank you very much for the clarification.
- The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.
I think that should be way explicitly stated in at least the Isaac ROS Pose Estimation
overview. I had checked quite some resources but was not able to get a clear statement on weather retraining was needed.
Here’s an example for DOPE
Thanks! Should I look into TAO to train on objects that don’t fall into the DOPE categories?