Orientation mismatch when applying GraspNet rotations under a neutral container prim using OpenCV optical conventions

I am placing grasp poses predicted by GraspNet into Isaac Sim.
The grasps were generated from the left camera view.
To avoid any authored transforms on camera prims, I created a neutral container at
/World/targets and I place my grasp prim under it.

My translation matches perfectly.
My orientation is wrong.

Scene setup

  • Three standard Isaac cameras under a Realsense group
    left, right, color

  • All three camera meshes show red forward along minus Z and Y to the left and X up in the viewport gizmo

  • I created /World/targets/left_target as a plain Xform and aligned it so that its axes match the left camera axes in the viewport

  • The grasp prim lives at /World/targets/grasp

Data

Intrinsics for the left camera:

fx = 634.09
fy = 566.49
cx = 640.00
cy = 360.00
f = 1.93
ax = 3.896
ay = 2.453
width = 1280
height = 720

Left and right camera translations in world:

left [0.2 0.0475 1.4]
right [0.2 -0.0475 1.4]

A sample detection output from graspnet looks like (from left camera as origin):

{
“index”: 1,
“cost”: 0.07800227635649198,
“trans_error_m”: 0.043542315417918656,
“rot_error_rad”: 1.1486653646191107,
“rot_error_deg”: 65.81367746553089,
“score”: 0.5542612671852112,
“width”: 0.09086745232343674,
“translation”: [
-0.0032448123674839735,
-0.08126597106456757,
0.2571863532066345
],
“rotation”: [
[
0.15586048364639282,
0.9777116179466248,
-0.1406681090593338
],
[
-0.7258150577545166,
0.20995239913463593,
0.6550667881965637
],
[
0.6700000166893005,
0.0,
0.7423611283302307
]
]
}

GraspNet rotations are provided in the camera optical frame
X right, Y down, Z forward

What I tried

  • Placing the grasp directly under /World/targets after rotating targets to match OpenCV optical axes
    I get translation correct at the right place in isaac sim as visualized in open3d ground truth cloud file, but orientation wrong

After that did not work out, so I tried to change the targets prim to isaac sim convention (towards -z, x top and y left)

  • Using a pure change of basis matrix C for my camera axes
    target camera convention in my scene is Z back, Y left, X up
    this mapping from OpenCV optical is
    x new equals minus y old
    y new equals minus x old
    z new equals minus z old
    so

C = [[ 0, -1, 0],
[-1, 0, 0],
[ 0, 0, -1]]

then

t_local = C @ t_cv R_local = C @ R_cv @ C.T

still see a mismatch in orientation in the viewport

What I observe

  • Translation is exactly where I expect

  • Orientation is off by a fixed rotation that I cannot reconcile
    it is not plus minus ninety about a principal axis and not a simple transpose issue

Questions for the community

  • Is my C matrix correct for mapping OpenCV optical to a camera that looks down minus Z with Y left and X up
    C as written maps ex to up, ey to left, ez to forward minus Z

  • Does SingleXFormPrim set_local_pose compose with any authored xformOps inside the referenced frame_prim.usd mesh that could explain a fixed extra rotation
    if so, should I set the pose on the container Xform instead of the child

  • For GraspNet JSON, are rotations typically stored as grasp in camera or camera to grasp
    if there is a standard, I can lock R or R transpose accordingly

  • Is there any known difference between column major interpretation of rotation matrices in Isaac Python helpers and the usual robotics convention where columns are basis vectors of the child expressed in the parent

  • Any gotcha with transform order on Xforms that I should double check
    for example an unexpected rotate then translate ordering

Ground truth from graspnet (open3d visualization):

Isaac sim transformed point:

Object of interest removed for better visualization:

You can see the realsense sensor right on top of the bin:

The entire prim structure of camera:

/World/Realsense/RSD455/Camera_Pseudo_Depth focal=1.9299999475479126 horiz_ap=3.8959999084472656 vert_ap=2.0545310974121094

/World/Realsense/RSD455/Camera_OmniVision_OV9782_Color focal=1.9299999475479126 horiz_ap=3.9000000953674316 vert_ap=2.453000068664551

/World/Realsense/RSD455/Camera_OmniVision_OV9782_Left focal=1.9299999475479126 horiz_ap=3.8959999084472656 vert_ap=2.453000068664551

/World/Realsense/RSD455/Camera_OmniVision_OV9782_Right focal=1.9299999475479126 horiz_ap=3.8959999084472656 vert_ap=2.453000068664551

The entire prim structure of targets (where I placed my grasp prim:

/World/targets type=Xform

/World/targets/grasp   type=Xform

  /World/targets/grasp/Frame   type=Xform

    /World/targets/grasp/Frame/X_line   type=Cylinder

      material=/World/targets/grasp/Frame/Looks/visual_material

    /World/targets/grasp/Frame/X_tip   type=Cone

      material=/World/targets/grasp/Frame/Looks/visual_material_1

    /World/targets/grasp/Frame/Y_line   type=Cylinder

      material=/World/targets/grasp/Frame/Looks/visual_material_2

    /World/targets/grasp/Frame/Y_tip   type=Cone

      material=/World/targets/grasp/Frame/Looks/visual_material_3

    /World/targets/grasp/Frame/Z_line   type=Cylinder

      material=/World/targets/grasp/Frame/Looks/visual_material_4

    /World/targets/grasp/Frame/Z_tip   type=Cone

      material=/World/targets/grasp/Frame/Looks/visual_material_5

Help me figure this out yall! After this, I need to change the orientation of the grasp frame as franka RMP flow needs z axis to be aligned with the ee. Will appreciate if you someone could give me idea to approach this orientation problem from OpenCV to Isaac sim then to RMPflow orientation. Let me know if you have any questions. I dont know if anyone wants to reproduce it, so i did not attach the USD files.

Hi @nishanthrajkumar1,

The issue is in the math: C @ R_cv @ C.T is a similarity transform (re-expressing a rotation operator in a new basis), but what you need is a frame composition (left-multiply by camera-to-world rotation).

GraspNet’s R has columns that are the grasp frame’s axes in camera coordinates. To express them in world coordinates, you simply left-multiply by the camera-to-world rotation — no similarity transform, no manual C matrix.

Isaac Sim’s Camera API handles the OpenCV convention conversion for you:

  import numpy as np                                                                                                                                                                                                                               
  from scipy.spatial.transform import Rotation                                                                                                                                                                                                     
  from isaacsim.sensors.camera import Camera                                                                                                                                                                                                       
  from isaacsim.core.prims import XFormPrim                                                                                                                                                                                                        
                                                                                                                                                                                                                                                   
  # Get camera pose already in OpenCV/ROS convention                                                                                                                                                                                               
  cam = Camera(prim_path="/World/Realsense/RSD455/Camera_OmniVision_OV9782_Left")                                                                                                                                                                  
  cam.initialize()                                                                                                                                                                                                                                 
  cam_pos, cam_quat_wxyz = cam.get_world_pose(camera_axes=" ros")                                                                                                                                                                                  
                                                                                                                                                                                                                                                   
  # Camera-to-world rotation (convert w,x,y,z -> x,y,z,w for scipy)                                                                                                                                                                                
  R_cam_to_world = Rotation.from_quat(                                                                                                                                                                                                             
      [cam_quat_wxyz[1], cam_quat_wxyz[2], cam_quat_wxyz[3], cam_quat_wxyz[0]]                                                                                                                                                                     
  ).as_matrix()                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                   
  # GraspNet output (already in OpenCV camera frame)                                                                                                                                                                                               
  R_cv = np.array(grasp["rotation"])                                                                                                                                                                                                               
  t_cv = np.array(grasp["translation"])                                                                                                                                                                                                            
                                                                                                                                                                                                                                                   
  # Compose: camera frame -> world frame                                                                                                                                                                                                           
  R_world = R_cam_to_world @ R_cv                                                                                                                                                                                                                  
  t_world = R_cam_to_world @ t_cv + cam_pos                                                                                                                                                                                                        
                                                                                                                                                                                                                                                   
  # Set on prim (Isaac Sim quaternions are scalar-first: w,x,y,z)                                                                                                                                                                                  
  q = Rotation.from_matrix(R_world).as _quat()  # scipy returns (x,y,z,w)                                                                                                                                                                          
  grasp_prim = XFormPrim("/World/targets/grasp")                                                                                                                                                                                                   
  grasp_prim.set_world_pose(                                                                                                                                                                                                                       
      position=t_world,                                                                                                                                                                                                                            
      orientation=np.array([q[3], q[0], q[1], q[2]])                                                                                                                                                                                               
  )                                                                                                                                                                                                                                                

Why C @ R @ C.T fails: That formula re-expresses a rotation operator (“rotate by θ around v”) in a different basis. But R_cv isn’t an operator — it’s a frame description (grasp axes in camera coords). To change the parent frame you compose
transforms: R_world = R_cam_to_world @ R_cv. Different operation, different result.

Additional tips:

  • Set the pose on /World/targets/grasp, not on the Frame child (which may have its own authored xformOps from your frame_prim.usd that would compose on top)
  • If your GraspNet fork stores R transposed (rows = axes instead of columns), use R_cv.T
  • For RMPflow (Z = approach), post-multiply by a fixed rotation mapping GraspNet’s approach axis to Z

Closing this out — please create a new topic and link back to this one if you have follow-up questions.