"cameraFisheyePolynomial" generated by Replicator is incorrect with default "fisheyePolynomial" cameraModel parameters

Isaac Sim Version

[√] 4.5.0
4.2.0
4.1.0
4.0.0
4.5.0
2023.1.1
2023.1.0-hotfix.1
Other (please specify):

Operating System

[√] Ubuntu 22.04
Ubuntu 20.04
Windows 11
Windows 10
Other (please specify):

GPU Information

  • Model: NVIDIA GeForce RTX 3090
  • Driver Version: 550.127.08

“cameraFisheyePolynomial” generated by Replicator is incorrect with default “fisheyePolynomial” cameraModel parameters

Detailed Description

When using the Replicator to generate a cameraFisheyePolynomial sensor with the camera parameters below:

camera_kwargs = {
    "position": (0, 0, 0),
    "rotation": (0, 0, 0),
    "projection_type": "fisheyePolynomial",
}
camera_prim = rep.create.camera(**camera_kwargs)

The generated camera_params are:

{
    "cameraAperture": [20.954999923706055, 15.290800094604492],
    "cameraApertureOffset": [0.0, 0.0],
    "cameraFisheyeLensP": [-0.0003699999942909926, -0.0007399999885819852],
    "cameraFisheyeLensS": [-0.0005799999926239252, -0.00022000000171829015, 0.0001900000061141327, -0.00019999999494757503],
    "cameraFisheyeMaxFOV": 200.0,
    "cameraFisheyeNominalHeight": 1216,
    "cameraFisheyeNominalWidth": 1936,
    "cameraFisheyeOpticalCentre": [970.9424438476562, 600.3748168945312],
    "cameraFisheyePolynomial": [0.0, 0.0024500000290572643, 0.0, 0.0, 0.0, 0.0],
    "cameraFocalLength": 24.0,
    "cameraFocusDistance": 400.0,
    "cameraFStop": 0.0,
    "cameraModel": "fisheyePolynomial",
    "cameraNearFar": [1.0, 1000000.0],
    "cameraProjection": [2.2906228624246854, 0.0, 0.0, 0.0, 0.0, 4.072218422088329, 0.0, 0.0, 0.0, 0.0, 1.000001000001e-06, -1.0, 0.0, 0.0, 1.000001000001, 0.0],
    "cameraViewTransform": [0.7068997241792948, -1.5705520830089987e-16, 0.7073137775805283, 0.0, 0.7073137775805283, 1.5696326997700512e-16, -0.7068997241792948, 0.0, 0.0, 1.0, 2.220446049250313e-16, -0.0, 137.56944419975406, -1.9699923326162707, -3.7603040814074267, 1.0],
    "metersPerSceneUnit": 1.0,
    "renderProductResolution": [1280, 720]
}

I attempted to project the distance_to_camera depth data into the camera coordinate system using the parameters of cameraFisheyePolynomial, but the projection results are incorrect. The main projection function is as follows. I have referenced the following document, which is the white paper of the ftheta model:

ftheta.pdf (87.0 KB)

def unproject_poly_depth(depth: torch.Tensor, intrinsics: torch.Tensor, poly: torch.Tensor) -> torch.Tensor:
    depth_batch = depth.clone()
    # check if inputs are batched
    is_batched = depth_batch.dim() == 4 or (depth_batch.dim() == 3 and depth_batch.shape[-1] != 1)
    # make sure inputs are batched
    if depth_batch.dim() == 3 and depth_batch.shape[-1] == 1:
        depth_batch = depth_batch.squeeze(dim=2)  # (H, W, 1) -> (H, W)
    if depth_batch.dim() == 2:
        depth_batch = depth_batch[None]  # (H, W) -> (1, H, W)
    if depth_batch.dim() == 4 and depth_batch.shape[-1] == 1:
        depth_batch = depth_batch.squeeze(dim=3)  # (N, H, W, 1) -> (N, H, W)
    if intrinsics_batch.dim() == 2:
        intrinsics_batch = intrinsics_batch[None]  # (3, 3) -> (1, 3, 3)

    # get image height and width
    im_height, im_width = depth_batch.shape[1:]

    # create image points in homogeneous coordinates (3, H x W)
    indices_u = torch.arange(im_width, device=depth.device, dtype=depth.dtype)
    indices_v = torch.arange(im_height, device=depth.device, dtype=depth.dtype)
    img_indices = torch.stack(torch.meshgrid([indices_u, indices_v], indexing="ij"), dim=0).reshape(2, -1)

    # Get principal point
    cx = intrinsics_batch[..., 0, 2]
    cy = intrinsics_batch[..., 1, 2]

    px = img_indices[0] - cx
    py = img_indices[1] - cy
    pd = torch.sqrt(px ** 2 + py ** 2)
    theta = torch.zeros_like(pd)
    for i in range(poly.shape[0]):
        theta += poly[i] * pd ** i

    phi = torch.atan2(py, px)
    rx = torch.sin(theta) * torch.cos(phi)
    ry = torch.sin(theta) * torch.sin(phi)
    rz = torch.cos(theta)
    ray = torch.stack([rx, ry, rz], dim=0)  # (3, H*W)
    ray = ray / ray.norm(p=2, dim=0, keepdim=True) # ray on unit sphere
    ray = ray.unsqueeze(0)  # (3, H x W) -> (1, 3, H x W)
    # flatten depth image (N, H, W) -> (N, H x W)
    depth_batch = depth_batch.transpose_(1, 2).reshape(depth_batch.shape[0], -1).unsqueeze(2)
    depth_batch = depth_batch.expand(-1, -1, 3)
    points_xyz = ray.transpose_(1, 2) * depth_batch  # (N, H x W, 3)

    # return points in same shape as input
    if not is_batched:
        points_xyz = points_xyz.squeeze(0)

    return points_xyz

The wrong projection results are shown in the following image:

I have tried using tartancalib to calibrate the same camera above with the generated RGB image containing an AprilGrid in the scene by Replicator. The calibration results with the omni-radtan model are as follows:

camera_model: omni
distortion_coeffs: [-0.10831953608105305, 0.43397974712468684, 3.765144569337964e-05, -3.0867713944010906e-05]
distortion_model: radtan
intrinsics: [1.8020312187152925, 756.1810601165237, 755.9053645818727, 641.4577349626691, 354.3992190535659]
resolution: [1280, 720]

Then I have tried to aproximate the cameraFisheyePolynomial parameters with the omni-radtan model parameters, the results are as follows:

[0.00000000e+00, 3.71387357e-03, -2.27210914e-07, 1.56039459e-09, -8.32343570e-12, 1.49184713e-14]

In the end, I used the above parameters to project the distance_to_camera depth data into the camera coordinate system, and the projection results are correct. The correct projection results are shown in the following image:

Therefore, I believe the cameraFisheyePolynomial generated by Replicator is incorrect with the default fisheyePolynomial camera model parameters. Could you please explain why this issue occurs or if there is a problem with my usage? Thank you!

Hi there,

could you provide the way you are accessing the depth data? Maybe you are somehow receiving the distance to image plane?

Cheers

The way I get the depth data is as follows:

config = {
   ......
    "writer": "BasicWriter",
    "writer_config": {
        "output_dir": "/workspace/isaaclab/user_apps/data_apps/assets/full_warehouse/path_tracking_sdg",
        "camera_params": True,
        "rgb": True,
        "distance_to_camera": True,
        "colorize_depth": True,
    },
    ......
}

writer = rep.WriterRegistry.get(writer_type)
writer_kwargs = config["writer_config"].copy()
writer_kwargs["output_dir"] = os.path.join(writer_kwargs["output_dir"], f"group_{group_num}")
writer.initialize(**writer_kwargs)
writer.attach(render_products)

I think I am not confusing distance to image plane with distance to camera, because when I tried using a pinhole camera, I was able to correctly project these two depths back to 3D space using different depth calculation methods.