Bad result with isaac_ros_foundationpose

I tried isaac_ros_foundationpose and the original FoundationPose, but when I input the same rgb, depth and mask images, they output different results. The original FoundationPose result (the left one) is perfect while the isaac_ros_foundationpose result (the right one) is not so good.
original FoundationPoseisaac_ros_foundationpose

I deployed isaac_ros_foundationpose and converted the model to tensorrt according to the quickstart tutorial . The mask was obtained by a yolov8 model I trained myself. The rgb and depth images were obtained directly from rs_launch.launch.py of realsense-ros and the image size was 640x480.
The following is the launch file I created to launch the isaac_ros_foundationpose node:

from ament_index_python.packages import get_package_share_directory
import launch
from launch.actions import DeclareLaunchArgument, IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from launch.conditions import IfCondition, UnlessCondition
from launch.substitutions import LaunchConfiguration
from launch_ros.actions import ComposableNodeContainer, Node
from launch_ros.descriptions import ComposableNode

MESH_FILE_NAME = ‘/workspaces/isaac_ros-dev/isaac_ros_assets/isaac_ros_foundationpose/carton/zhonghua.obj’
TEXTURE_MAP_NAME = ‘/workspaces/isaac_ros-dev/isaac_ros_assets/isaac_ros_foundationpose/carton/assets/zhonghua_color.jpg’
REFINE_MODEL_NAME = ‘/tmp/refine_model.onnx’
REFINE_ENGINE_NAME = ‘/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/refine_trt.plan’
SCORE_MODEL_NAME = ‘/tmp/score_model.onnx’
SCORE_ENGINE_NAME = ‘/workspaces/isaac_ros-dev/isaac_ros_assets/models/foundationpose/score_trt.plan’

def generate_launch_description():
“”“Generate launch description for testing relevant nodes.”“”

launch_args = [
    DeclareLaunchArgument(
        'mesh_file_path',
        default_value=MESH_FILE_NAME,
        description='The absolute file path to the mesh file'),

    DeclareLaunchArgument(
        'texture_path',
        default_value=TEXTURE_MAP_NAME,
        description='The absolute file path to the texture map'),

    DeclareLaunchArgument(
        'refine_model_file_path',
        default_value=REFINE_MODEL_NAME,
        description='The absolute file path to the refine model'),

    DeclareLaunchArgument(
        'refine_engine_file_path',
        default_value=REFINE_ENGINE_NAME,
        description='The absolute file path to the refine trt engine'),

    DeclareLaunchArgument(
        'score_model_file_path',
        default_value=SCORE_MODEL_NAME,
        description='The absolute file path to the score model'),

    DeclareLaunchArgument(
        'score_engine_file_path',
        default_value=SCORE_ENGINE_NAME,
        description='The absolute file path to the score trt engine'),
]

mesh_file_path = LaunchConfiguration('mesh_file_path')
texture_path = LaunchConfiguration('texture_path')
refine_model_file_path = LaunchConfiguration('refine_model_file_path')
refine_engine_file_path = LaunchConfiguration('refine_engine_file_path')
score_model_file_path = LaunchConfiguration('score_model_file_path')
score_engine_file_path = LaunchConfiguration('score_engine_file_path')


# Realsense depth is in uint16 and millimeters. Convert to float32 and meters
convert_metric_node = ComposableNode(
    package='isaac_ros_depth_image_proc',
    plugin='nvidia::isaac_ros::depth_image_proc::ConvertMetricNode',
    remappings=[
        ('image_raw', 'camera/aligned_depth_to_color/image_raw'),
        ('image', 'depth_fp32')
    ]
)


foundationpose_node = ComposableNode(
    name='foundationpose',
    package='isaac_ros_foundationpose',
    plugin='nvidia::isaac_ros::foundationpose::FoundationPoseNode',
    parameters=[{
        'mesh_file_path': mesh_file_path,
        'texture_path': texture_path,

        'refine_model_file_path': refine_model_file_path,
        'refine_engine_file_path': refine_engine_file_path,
        'refine_input_tensor_names': ['input_tensor1', 'input_tensor2'],
        'refine_input_binding_names': ['input1', 'input2'],
        'refine_output_tensor_names': ['output_tensor1', 'output_tensor2'],
        'refine_output_binding_names': ['output1', 'output2'],

        'score_model_file_path': score_model_file_path,
        'score_engine_file_path': score_engine_file_path,
        'score_input_tensor_names': ['input_tensor1', 'input_tensor2'],
        'score_input_binding_names': ['input1', 'input2'],
        'score_output_tensor_names': ['output_tensor'],
        'score_output_binding_names': ['output1'],
        'refine_iterations': 1,
    }],
    remappings=[
        ('pose_estimation/depth_image', 'depth_fp32'),
        ('pose_estimation/image', 'camera/color/image_raw'),
        ('pose_estimation/camera_info', 'camera/color/camera_info'),
        ('pose_estimation/segmentation', 'mask'),
        ('pose_estimation/output', 'output')]
    )


foundationpose_container = ComposableNodeContainer(
    name='foundationpose_container',
    namespace='foundationpose_container',
    package='rclcpp_components',
    executable='component_container_mt',
    composable_node_descriptions=[convert_metric_node, foundationpose_node],
    output='screen'
)

return launch.LaunchDescription(launch_args + [foundationpose_container])

Just in case, the following is the log file of the foundationpose node.
foundationpose.log (9.2 KB)
Any help would be greatly appreciated.

Hi @woblitent

Thank you for opening this post and welcome to the Isaac ROS forum.
After reviewing your images and messages, I recommend starting by verifying that the timestamp of the mask and other inputs is valid and matches.

If the issue persists, we may need to obtain their sample input and mesh file in order to replicate the problem. This will allow us to further investigate and troubleshoot the issue effectively.

Thank you in advance,
Raffaello

@Raffaello Thank you for your reply.
Below is the code for my segmentation node, and the camera remains stationary when running.

import rclpy
from ultralytics import YOLO
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from vision_msgs.msg import Detection3DArray
from cv_bridge import CvBridge
import message_filters
import numpy as np
import cv2


def mask_func(model, color_image, depth_image):
    # Remove background
    clipping_distance = 3000
    grey_color = 153
    depth_image_3d = np.dstack(
        (depth_image, depth_image, depth_image)
    )  # depth image is 1 channel, color is 3 channels
    bg_removed = np.where(
        (depth_image_3d > clipping_distance),
        grey_color,
        color_image,
    )
    
    # Segmentation
    results = model(bg_removed, conf = 0.5, verbose=False,imgsz= (480,640), device=0)
    mask = np.zeros((480,640), dtype=np.uint8)
    try:
        bboxes = results[0].boxes.cpu().numpy().xyxy
        masks = results[0].masks.data.cpu().numpy()
        top_id = np.argmin(bboxes[:, 1])
        mask = masks[top_id]
        mask[mask != 0] = 255
        mask = mask.astype(np.uint8)
    except:
        print('No object detected!')
    return mask


class SegmentNode(Node):
    def __init__(self):
        super().__init__('mask_publisher')

        self.model = YOLO('/workspaces/isaac_ros-dev/isaac_ros_assets/models/yolov8/v8l_seg_FT.pt', task='segment')

        self.rgb_sub = message_filters.Subscriber(self, Image, '/camera/color/image_raw')
        self.depth_sub = message_filters.Subscriber(self, Image, '/camera/aligned_depth_to_color/image_raw')
        self.camera_info_sub = message_filters.Subscriber(self, CameraInfo, '/camera/color/camera_info')

        
        self.ts = message_filters.ApproximateTimeSynchronizer([self.rgb_sub, self.depth_sub, self.camera_info_sub], 10, 0.1, allow_headerless=True)
        self.ts.registerCallback(self.mask_callback)
        
        self.mask_publisher = self.create_publisher(Image, '/mask', 10)
        self.bridge = CvBridge()

    def mask_callback(self, rgb_msg, depth_msg, camera_info_msg):

        self.K = np.array([[camera_info_msg.k[0], 0.0, camera_info_msg.k[2]],
                            [0.0, camera_info_msg.k[4], camera_info_msg.k[5]],
                            [0.0, 0.0, 1.0]])            
        
        rgb_img = self.bridge.imgmsg_to_cv2(rgb_msg, desired_encoding='bgr8')
        depth_img = self.bridge.imgmsg_to_cv2(depth_msg, desired_encoding='16UC1')
        
        mask = mask_func(self.model, rgb_img, depth_img)
        
        mask_msg = self.bridge.cv2_to_imgmsg(mask, encoding="mono8")
        mask_msg.header = rgb_msg.header

        self.mask_publisher.publish(mask_msg)
        self.get_logger().info('Publishing mask')

def main(args=None):
    rclpy.init(args=args)
    segment_node = SegmentNode()
    rclpy.spin(segment_node)
    segment_node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

This is my segmentation result, which have been overlaid on the RGB image.
segment_vis

Finally, here are my mesh file and rosbag. I hope they can help you identify the issue. Thank you!
google drive

Hi @Raffaello ,
I wanted to follow up on this issue as it’s been some time since my last update. This issue is really troubling me. Any updates you might have would be extremely helpful. Thanks!

Hi @woblitent

I apologize for this delay; I forwarded your issue to engineering, but we are busy these days.
I will keep you posted

Raffaello

@Raffaello Thank you for your reply.
I appreciate your time. I’ll look forward to any updates when you’re able. Thanks again for your attention to this issue.

Hi @woblitent

Thank you for your patience and for sharing the file. Looking at the rosbag, the pose output is unstable.

To perform better and make stable the output, you can tune the parameter: refine_iterations with a higher value. (e.g, 3 is enough for this case).

ROS Parameter Type Default Description
refine_iterations int 1 The number of iterations applied to the refinement model can enhance accuracy. However, more iterations will take longer time for processing.

Keep me posted.

Thank you again for your time.
Raffaello

Hi @Raffaello
Thank you for your reply. I’ve already tried modifying the refine_iterations to 5 (as set here), but the results haven’t improved. I’m using the code here to visualize the pose in the /output, but the result is not ideal.


import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from vision_msgs.msg import Detection3DArray
from cv_bridge import CvBridge
import message_filters
import numpy as np
import cv2
import time
import trimesh

import transforms3d as tfs


def to_homo(pts):
    '''
    @pts: (N,3 or 2) will homogeneliaze the last dimension
    '''
    assert len(pts.shape)==2, f'pts.shape: {pts.shape}'
    homo = np.concatenate((pts, np.ones((pts.shape[0],1))),axis=-1)
    return homo

def draw_posed_3d_box(K, img, ob_in_cam, bbox, line_color=(0,255,0), linewidth=2):
    '''Revised from 6pack dataset/inference_dataset_nocs.py::projection
    @bbox: (2,3) min/max
    @line_color: RGB
    '''
    min_xyz = bbox.min(axis=0)
    xmin, ymin, zmin = min_xyz
    max_xyz = bbox.max(axis=0)
    xmax, ymax, zmax = max_xyz

    def draw_line3d(start,end,img):
        pts = np.stack((start,end),axis=0).reshape(-1,3)
        pts = (ob_in_cam@to_homo(pts).T).T[:,:3]   #(2,3)
        projected = (K@pts.T).T
        uv = np.round(projected[:,:2]/projected[:,2].reshape(-1,1)).astype(int)   #(2,2)
        img = cv2.line(img, uv[0].tolist(), uv[1].tolist(), color=line_color, thickness=linewidth, lineType=cv2.LINE_AA)
        return img

    for y in [ymin,ymax]:
        for z in [zmin,zmax]:
            start = np.array([xmin,y,z])
            end = start+np.array([xmax-xmin,0,0])
            img = draw_line3d(start,end,img)

    for x in [xmin,xmax]:
        for z in [zmin,zmax]:
            start = np.array([x,ymin,z])
            end = start+np.array([0,ymax-ymin,0])
            img = draw_line3d(start,end,img)

    for x in [xmin,xmax]:
        for y in [ymin,ymax]:
            start = np.array([x,y,zmin])
            end = start+np.array([0,0,zmax-zmin])
            img = draw_line3d(start,end,img)

    return img

def draw_xyz_axis(color, ob_in_cam, scale=0.1, K=np.eye(3), thickness=3, transparency=0,is_input_rgb=False):
    '''
    @color: BGR
    '''
    if is_input_rgb:
        color = cv2.cvtColor(color,cv2.COLOR_RGB2BGR)
    xx = np.array([1,0,0,1]).astype(float)
    yy = np.array([0,1,0,1]).astype(float)
    zz = np.array([0,0,1,1]).astype(float)
    xx[:3] = xx[:3]*scale
    yy[:3] = yy[:3]*scale
    zz[:3] = zz[:3]*scale
    origin = tuple(project_3d_to_2d(np.array([0,0,0,1]), K, ob_in_cam))
    xx = tuple(project_3d_to_2d(xx, K, ob_in_cam))
    yy = tuple(project_3d_to_2d(yy, K, ob_in_cam))
    zz = tuple(project_3d_to_2d(zz, K, ob_in_cam))
    line_type = cv2.LINE_AA
    arrow_len = 0
    tmp = color.copy()
    tmp1 = tmp.copy()
    tmp1 = cv2.arrowedLine(tmp1, origin, xx, color=(0,0,255), thickness=thickness,line_type=line_type, tipLength=arrow_len)
    mask = np.linalg.norm(tmp1-tmp, axis=-1)>0
    tmp[mask] = tmp[mask]*transparency + tmp1[mask]*(1-transparency)
    tmp1 = tmp.copy()
    tmp1 = cv2.arrowedLine(tmp1, origin, yy, color=(0,255,0), thickness=thickness,line_type=line_type, tipLength=arrow_len)
    mask = np.linalg.norm(tmp1-tmp, axis=-1)>0
    tmp[mask] = tmp[mask]*transparency + tmp1[mask]*(1-transparency)
    tmp1 = tmp.copy()
    tmp1 = cv2.arrowedLine(tmp1, origin, zz, color=(255,0,0), thickness=thickness,line_type=line_type, tipLength=arrow_len)
    mask = np.linalg.norm(tmp1-tmp, axis=-1)>0
    tmp[mask] = tmp[mask]*transparency + tmp1[mask]*(1-transparency)
    tmp = tmp.astype(np.uint8)
    if is_input_rgb:
        tmp = cv2.cvtColor(tmp,cv2.COLOR_BGR2RGB)

    return tmp

def project_3d_to_2d(pt,K,ob_in_cam):
    pt = pt.reshape(4,1)
    projected = K @ ((ob_in_cam@pt)[:3,:])
    projected = projected.reshape(-1)
    projected = projected/projected[2]
    return projected.reshape(-1)[:2].round().astype(int)



class PoseVisNode(Node):

    def __init__(self):
        super().__init__('pose_visualizer')
        
        mesh_file = "/workspaces/isaac_ros-dev/isaac_ros_assets/isaac_ros_foundationpose/carton/zhonghua.obj"
        mesh = trimesh.load(mesh_file)
        self.to_origin, extents = trimesh.bounds.oriented_bounds(mesh)
        self.bbox = np.stack([-extents/2, extents/2], axis=0).reshape(2,3)
        
        self.pose_sub = message_filters.Subscriber(self,
            Detection3DArray,
            '/output')    
        self.rgb_sub = message_filters.Subscriber(self, Image, '/camera/color/image_raw')
        self.camera_info_sub = message_filters.Subscriber(self, CameraInfo, '/camera/color/camera_info')
      
        self.ts = message_filters.ApproximateTimeSynchronizer([self.rgb_sub,  self.camera_info_sub, self.pose_sub], 10, 0.1, allow_headerless=True)
        self.ts.registerCallback(self.pose_callback)
        
        self.bridge = CvBridge()


    def pose_callback(self, rgb_msg,camera_info_msg,pose_msg):
        self.K = np.array([[camera_info_msg.k[0], 0.0, camera_info_msg.k[2]],
                                [0.0, camera_info_msg.k[4], camera_info_msg.k[5]],
                                [0.0, 0.0, 1.0]])         

        rgb_img = self.bridge.imgmsg_to_cv2(rgb_msg, desired_encoding='bgr8')
        self.rgb = np.array(rgb_img)
                
        t_msg = pose_msg.detections[0].results[0].pose.pose.position
        pred_t = np.array([t_msg.x, t_msg.y, t_msg.z])
        r_msg = pose_msg.detections[0].results[0].pose.pose.orientation
        pred_r_quart = np.array([r_msg.x, r_msg.y, r_msg.z, r_msg.w])
        pred_r = tfs.quaternions.quat2mat(pred_r_quart)
        print('T:',pred_t)
        print('R:',pred_r)
        
        pose_matrix = np.eye(4)
        vis_t = pred_t.reshape(3, 1)
        pose_matrix[:3, :3] = pred_r
        pose_matrix[:3, 3:4] = vis_t
        
        
        center_pose = pose_matrix@np.linalg.inv(self.to_origin)
        vis = draw_posed_3d_box(self.K, img=self.rgb, ob_in_cam=center_pose, bbox=self.bbox)
        vis = draw_xyz_axis(self.rgb, ob_in_cam=center_pose, scale=0.1, K=self.K, thickness=3, transparency=0, is_input_rgb=False)
        cv2.imwrite('/workspaces/isaac_ros-dev/3.jpg', vis)

              
def main(args=None):
    rclpy.init(args=args)
    pose_vis_node = PoseVisNode()
    rclpy.spin(pose_vis_node)
    pose_vis_node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':

    main()

I want to know what’s causing the difference between isaac ros foundationpose and NVlabs Foundation Pose. Thank you again for your time!

Hi @woblitent

If you are working with multiple refinement features, there is a bug that will be fixed in the next release.

I am not sure about the release date yet, but I will keep you updated.

Hi @Raffaello , I apologize for my late reply.
I tried setting est_refine_iter to 1 in NVlabs Foundation Pose (I believe this parameter is equivalent to refine_iterations in isaac_ros_foundationpose), and I also turned off tracking. Although the output pose is slightly unstable, it still basically corresponds to the object’s true pose. In contrast, the pose output by isaac_ros_foundationpose is essentially unusable (just like the two images I posted at the beginning).Is it normal for there to be such a large difference between the results of NVlabs Foundation Pose and isaac_ros_foundationpose?

Hi @Raffaello
My understanding is that the pose output by NVlabs FoundationPose is in the camera coordinate system, while the pose output by isaac_ros_foundationpose is in another coordinate system. Is this correct? If so, could you please advise on how to convert it to the camera coordinate system?
Thanks!

Hi Raffaello, @Raffaello I have the same problem. Have you updated it? I adjusted refine_iterations and it didn’t work.

Hi @user109294 and @woblitent

I apologize for this late reply, but I was waiting for the Isaac ROS 3.2 release. We made different updates in particular for:

  • isaac_ros_pose_estimation: improved FoundationPose performance and accuracy with multiple refinement, particularly for symmetric objects

Release notes: Release Notes — isaac_ros_docs documentation

Please, if you can update to the latest Isaac ROS 3.2 release, check if your issue is fixed and let me know.

Best,
Raffaello

Hi @Raffaello , I tried your new version of Isaac ROS 3.2, but the effect is still poor. My post: 316592
I wonder if you have aligned the foundationpose of NVIDIA and Isaac?

1 Like

Hi @user109294

Can you retry using the latest Isaac ROS 3.2? We made a few small bug fixes in the last weeks and may have fixed your issue. If you are already using this setup, please clone the docker again and reinstall all software.

Let me know.
Raffaello