Insight into world to camera transform for 3D bounding box

I am hitting a bug when I attempt to transform 3D bounding box points from the world into its 2D projection format. It seems as though I have a translation (maybe between sensor origin and where image view is?), as is seen in the images below:

My math is as follows, using the information provided to me in the sampling_log.yaml file produced following data generation and knowledge of the camera coordinate system (+Y Up, -Z Forward):

  1. I first find the camera in the world frame by using the rotation of the camera object in the world frame outputted in the log file and a rotation into the (+Y Up, -Z Forward) camera coordinate system:

  2. I then create the 4x4 homogenous transform using the camera coordinates in the world frame outputted by the log file:

  3. Then for each point provided in the .npy file for the cuboid I transform it into the camera frame as such:

  4. Then I pass in the points in the camera frame into the following openCV function: cv2.projectPoints(cuboid, t_vec, r_vec, camera_intrinsic_mat, distortion_coef) where the camera intrinsics matrix is
    image. This provides me with the output seen above, and I am unsure as to where I might be going wrong.

Any insight would be greatly appreciated! Many thanks in advance.

Hi Noelle,

I have a couple questions -
Part 1: It appears you are trying to rotate the world frame, rather than the camera object to the camera (both within the same world frame). Do Rx and Ry effectively accomplish the rotation from the camera object to the camera? If so, then shouldn’t Rx * Ry = cobj_R_c ? So in that case, w_R_c = w_R_cobj * cobj_R_c instead of w_R_c = Rx * Ry * w_R_cobj. I don’t have enough context on what the difference of the camera object and camera rotation are though to make an accurate assessment of this.

Part 2: Look good.

Part 3: Looks good. But keep in mind that each point [Px, Py, Pz] should be a 3D bounding box corner in the global reference frame. Can you ensure that is the case? Before going to the bounding box corners, I suggest starting out with only the centroid of the bounding box (1 point instead of 8) and making sure it projects to the center of the object. This may help to debug if the issue is a bounding box offset rather than a cuboid pose issue. With that in mind, try passing the cv2.projectPoints function a single 3D point for each cuboid rather than 8 points.

Part 4: Looks good, but you didn’t mention the distortion coefficients. Did you set these to anything? Is the distortion negligible?

If you have a short snippet of python code with these computations, you can paste it here for me to reproduce so I can try to help you debug things.


Hi Henry! Thanks for getting back to me! Let me know if this answers your questions:

Part 1: The R_x* R_y is trying to rotate from the camera origin to the camera view, so in my understanding Rx*Ry = cobj_R_c is what I was going for. The thought process behind this was because I believed the coordinates that were being provided for the camera rotation in world view were from the camera sensor itself and not from the camera view, and so that was my attempt to rotate the object into camera view. I tried w_R_cobj * cobj_R_c too and the result also had a constant offset.

However, I found the below lines of code in the /isaac_sim-2022.1.0/tools/composer/src/scene/asset/ file this morning:

and so rather than doing the R_x*R_y, I am taking the euler angles I get from the log file and doing the exact same thing as is done in that code (camera_rpy+[90, 0, 270]). However this provides a result that almost seems mirrored over the x-axis of the image, seen below:
image. But even these aren’t an exact mirror because there’s a slight translation offset.

Part 3: Yes! This is the case, the bounding box corner is in the global reference frame, and I only transform a single point at a time and passing in only a point provides me an identical value.

Part 4: I set all distortion coefficients to 0 because my understanding was that they were negligible.

I will follow up with a short snippet of code with the images and log file I am pulling from!

Thanks Again,

Okay! Following up with a folder of code that has the sampling_log.yaml file, a test image, a corresponding .npy file, and that does the calculations. (1022.1 KB)

The main methods that do the calculations are transform_bb3d_data(bb3d_data_world_view, H_world2camera, camera_intrinsics) , get_4x4_mat(t, eul), and then produce_frames(data, frame_id, class_id) (which just gets the data from the .yaml file and adds the [90,0,270] to the retrieved rpy of the camera object). Let me know if this is sufficient!


Hi Noelle,

Thanks for sending your code along! It was helpful.

I wasn’t able to fully debug it and make it work, but I do a suggestion. The main issue is I saw you computed the camera_rpy rotation using np.add. I believe you were trying to add the x rotation of 90, a z rotation of 270, and the world to camera object transform of 3 floats from the dict. Was this your full implementation of part 1, plus the conversion to a rotation matrix? If so, I do not believe you can compute the rotations this way. You need to use np.matmul between different rotation matrices to do it. I attached a code sample here, but I don’t believe I have the correct Rx and Ry matrices to transform the camera to camera object. (9.6 KB)

Also, are you sure that the values in the camera rotation of the dict are XYZ euler angles? If they are a different representation (e.g. direction cosines, ZYX angles, ZYZ angles) it could be causing problems. I added a new function to your code that explicitly converts a set of XYZ euler angles to a rotation matrix.

Hi Henry,

Thanks for the help! I used the np.add() to try and replicate what the source code ( /isaac_sim-2022.1.0/tools/composer/src/scene/asset/ was doing to the angles fed in from the parameter file in hopes that it would provide better visuals for the bounding box than manually doing the w_R_cobj@cobj_R_c aforementioned.

I also did confirm that they were XYZ euler angles as is seen in here:

I ended up scaling fx by -1 and fy by 1.3 to fix the mirror and translation, which got me the following results, but not a perfect solution because the scaling was by intuition and not by any values retrieved in code / from the simulator:

Thanks again for the help,


This is helpful in finding the issue – your Fx and Fy should always be positive. So I believe the issue with the flip fx may be caused by something else.

Anyhow, did you happen to notice that your current value of Fy (1281.95) * 1.3 is roughly equal to Fx? I replaced your code with this:

camera_intrinsics = np.array([
    #[1662.99, 0, 960],
    #[0, 1281.95, 540],
    #[0,       0,   1]
    [1662.99, 0, 960],
    [0, 1662.99, 540],
    [0,       0,   1]

And the results look pretty good. Is it possible there was an error in your computation of Fy = 1281.95?

Hi Henry,

Thanks for the quick response. I didn’t even piece that together the 1.3*fy ~ 1662.99, but you’re completely right that it works that way. I have double checked the calculation for fy and it should be right, but I am also just pulling those values from the provided samling_log.yaml file, where fx, fy is on the bottom:

I switched the rotation back to my original inclination where it rotated the world axes (+X forward, +Z Up) to camera axes (-Z Forward, +Y Forward) and that more or less fixed the mirror issue:

Thanks for all the insight! I guess my question is if there is something in the simulator that is setting fy to fx with the camera?


Hm - I’m not sure - I would have to go through the simulator code to understand it better to see where it came from. But I’m glad you got things working to the degree you have so far!