World frame to pixel frame

Hi, @toni.sm I’m trying to project a 3D point in world frame to pixel frame. I assign red spheres in object above the table and take a picture from plcaed camera. However, the projected pixel result colored in red is not aligned with assigned spheres. The result is shown in
vis_kp_2
vis_kp_0
.

The code to project 3D to 2D is as following:

def get_kp_2D(kps, camera_pose, camK):
    """ transform 3D to 2D points

    Args:
        kps:

    Return:
        kps_2d:
    """
    kps_2d = []
    camK = np.concatenate((camK,np.array([[0.0,0.0,0.0]]).T),1)
  
    for k in range(len(kps)):
        kp_pose = np.append(kps[k][:3],1.0)
        # transform 3d to 2d
        camera_pose = pose_to_4x4(camera_pose)
        object_in_cam = np.dot(np.linalg.inv(camera_pose), kp_pose.T)
        z = object_in_cam[2]
        kp_in_cam = np.dot(np.linalg.pinv(pose_to_4x4(camera_pose)), kp_pose.T)
        kp_2d = 1/z * np.dot( camK, kp_in_cam)
        kp_2d = kp_2d.round().astype(np.int32)
        print("kp 2D:", kp_2d)
        kps_2d.append([kp_2d[:2]])
        
    kps_2d = np.array(kps_2d)
    return kps_2d

Hi @newuhe

Could you provide more details about the setup, e.g.: where the involved matrix come from, a minimal working example, etc…?

@toni.sm Thanks for replying. The camera_pose is from

def get_camera_pose(self):
        """Get camera pose.

        Returns:
            camera_pose: 7d array
        """
        stage = omni.usd.get_context().get_stage()
        prim = stage.GetPrimAtPath(self._viewport_window.get_active_camera())
        pose = omni.usd.get_world_transform_matrix(prim)    
        rot_mat = np.array(pose.ExtractRotationMatrix())     
        trans = np.array(pose.ExtractTranslation())
        camera_pose_matrix = np.zeros([4,4])
        camera_pose_matrix[:3,:3] = rot_mat
        camera_pose_matrix[:3,-1] = trans
        camera_pose_matrix[-1,-1] = 1
        # rotate 180 degree in x axis for convention
        camera_pose = np.dot(
            camera_pose_matrix,
            pose_to_4x4(np.array([0.0, 0.0, 0.0, 180.0 / 180.0 * np.pi, 0.0, 0.0])),
        )
        camera_pose[:3,-1] = camera_pose[:3,-1] * get_stage_units()
        return camera_pose

And the camK is from

def get_camK(self, flag="mat"):
        """Get camera intrinsics in for mat 3x3 matrix or list of size 4.

        Args:
            flag: by default return 3x3 matrix, if set to list, return size-4 list, if set to
                    raw, return raw realsense data.
        Returns:
            camera intrinsic.
        """
        camera_params = self._sd_helper.get_camera_params(self._viewport_window)
        near, far = camera_params["clipping_range"]
        width = self.config.width
        height = self.config.height
        horiz_aperture = camera_params["horizontal_aperture"]
        vert_aperture = horiz_aperture * height / width
        focal_length = camera_params["focal_length"]

        stage = omni.usd.get_context().get_stage()
        prim = stage.GetPrimAtPath(self._viewport_window.get_active_camera())
        focal_length = prim.GetAttribute("focalLength").Get()
        horizontal_aperture = prim.GetAttribute("horizontalAperture").Get()
        vertical_aperture = prim.GetAttribute("verticalAperture").Get()
        width, height = self._viewport_window.get_texture_resolution()
        fx = width * focal_length / horizontal_aperture
        fy = height * focal_length / vertical_aperture
        cx = width * 0.5
        cy = height * 0.5
        print("real camK:", np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]])) 
        camK = np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]])

        return camK

The kp_poses are fetched from

kp_poses = []
for i in range(4):
 kp_poses.append(self.get_object_pose("/waterproof/Sphere_0{}".format(i)))

def get_object_pose(self, prim_path):
        """Get object pose.

        Args:
            prim_path: prim path of object

        Returns:
            pose: 7-dim array of position and orientation quarternion
        """
        prim = get_prim_at_path(prim_path)
        xprim = XFormPrim(prim.GetPath())
        position, orientation = xprim.get_world_pose()
        position *= get_stage_units()
        pose = np.concatenate((position, orientation))
        return pose

The problem may be the camera has distorsion?

Hi @newuhe

Sorry for my late response…
Could you please share the whole script (the functions are class methods and the pose_to_4x4 method is not defined above) and the flatten .usd for testing?