Importing OAK-D-SR-PoE camera

Hi Isaac Sim team,

We want to import the OAK-D-SR-PoE camera into Isaac sim, but we are not managing to make the physical and simulated outputs match. We have used a simple setup to try and validate the outputs. More specifically we can see that the amount of space taken up by the box used in our experiments physically and in simulation do not line up, this is after setting up the simulated camera using the intrinsic values and distortion coefficients.

Setup for initial validation
Images are taken of a box at set distances. The images are then analyzed and finally the % of the image taken by the box is noted. The experiment is repeated in simulation using the same set distances and finally the % of the image taken up by the physical image is compared to that of the simulated. The idea is to validate that the camera parameters in the simulation roughly match those of the real camera, so we are not looking for millimeter precision.

  • The distances tested are 30, 40, 50, 60, 70, 80 and 100cm and these are measured from the front of the camera to the box surface.
  • The camera is also placed on a box lifting the middle of the camera lens 10.5cm.
  • The box is kept roughly in the middle of the frame to our best ability.
  • The camera intrinsic parameters from the physical camera is used in simulation.
  • The camera distortion coefficients from the physical camera is used in simulation.
  • The box is initially offset by box_thickness/2 in simulation to make sure the box and camera start exactly on top of eachother.
  • We are aware that the actual focal point could be inside the camera housing and that this can mean an error of 1-2 cm since we are measuring from the front of the camera housing (we added this distance in simulation afterwards to check and it makes such a small difference in the % space the box takes up that we are confident it is not the source of the discrepancy between real and simulated images).

Camera specifications
These are from the Luxonis page. The bold specs are the ones used in the script to create the simulated camera together with the camera intrinsic values and distortion coefficients.

Camera Specs Stereo pair / Color ToF
Sensor OV9782 (color, PY074) 33D ToF
DFOV / HFOV / VFOV 89.5° / 80° / 55° 82.3° / 70° / 54.7°
Resolution 1MP (1280x800) VGA (640x480)
Range / Focus FF: 20cm - ∞ 20cm - 5m
Max Framerate 120 FPS (800P) 30 FPS (VGA)
Pixel size 3µm x 3µm 7µm x 7µm
Lens size 1/4 inch 1/3.2 inch
F-number 2.0 ±5% 1.45 ± 5%
Effective Focal Length 2.35mm N/A

Camera matrix

Summary

[[805.8621215820312, 0.0, 651.9990844726562], [0.0, 805.4034423828125, 412.4564208984375], [0.0, 0.0, 1.0]]

Distortion coefficients

Summary

k1: -9.155631065368652
k2: 91.02870178222656
p1: -0.002392938593402505
p2: 0.00018919834110420197
k3: -58.8187255859375
k4: -9.20724105834961
k5: 90.6237564086914
k6: -57.472503662109375
s1: 0.0
s2: 0.0
s3: 0.0
s4: 0.0
τx: 0.0
τy: 0.0

FOV
RGB FOV 71.86000061035156, Mono FOV 71.86000061035156. These were reported by the Luxonis calibration read script and we do not know what they are for, we simply used the documented FOVs in the camera specifications.

Script for defining camera specification
The script is a slight modification to the one in the Isaac Sim official camera documentation. We changed the division factor from 10 to 100 for the camera.set_focal_length as the documentation said it should be 1/10 of scene units.

from omni.isaac.sensor import Camera
from omni.isaac.core import World

my_world = World(stage_units_in_meters=1.0)

camera = Camera(prim_path="/World/camera")

camera.initialize()

# Camera example
width, height = 1280, 800
camera_matrix = [[805.8621215820312, 0.0, 651.9990844726562], [0.0, 805.4034423828125, 412.4564208984375], [0.0, 0.0, 1.0]]
pixel_size = 3 * 1e-3
f_stop = 2.0
focus_distance = 0.2

((fx,_,cx),(_,fy,cy),(_,_,_)) = camera_matrix
horizontal_aperture =  pixel_size * width
vertical_aperture =  pixel_size * height
focal_length_x  = fx * pixel_size
focal_length_y  = fy * pixel_size
focal_length = (focal_length_x + focal_length_y) / 2

camera.set_focal_length(focal_length / 100.0)
#camera.set_focus_distance(focus_distance)
camera.set_focus_distance(0.2)
camera.set_lens_aperture(f_stop * 100.0)
camera.set_horizontal_aperture(horizontal_aperture / 100.0)
camera.set_vertical_aperture(vertical_aperture / 100.0)

camera.set_clipping_range(0.05, 1.0e5)

diagonal_fov = 89.5 # from documentation of camera

camera.set_projection_type("fisheyePolynomial")

distortion_coefficients =  [ -9.155631065368652, 91.02870178222656,  -0.002392938593402505, 0.00018919834110420197,  -58.8187255859375,  -9.20724105834961, 90.6237564086914, -57.472503662109375]

camera.set_rational_polynomial_properties(width, height, cx, cy, diagonal_fov, distortion_coefficients)

Experiment results
Not including 60cm, 70cm and 80cm to keep post shorter. As you can see the amount of image that is taken up by the carton box do noes not match between the real and simulated images, we have checked and double checked that the camera parameters match but have not been able to identify what is going wrong.

30cm


40cm


50cm


100cm


Camera property inside Isaac Sim
Here are the raw values for the camera in simulation after setting everything up, I have not been successful in identifying which params are going wrong here, hopefully you can help clear it up for us. We used the camera documentation to create the camera using the polynomial f-theta model.

Can you help us match our simulated camera to the real-world one?

Thanks!

Hoping somebody can pitch in and help out 🙏

Hi scn,

I’ve tried to reproduce the problem that you are observing and I think it’s possible that the issue is caused either by:

  • positioning the camera or the asset at the scene incorrectly;
  • calibration parameters from a wrong camera or camera setting being used;

I wasn’t able to reproduce your setup exactly, as I couldn’t find the dimensions of the box that you’ve used. You’ve stated that the camera is lifted by 10.5cm, but the box seems to be a rectangular box, not a square one and its dimensions do not seem to be stated.

I was able to reproduce the rendering for your camera intrinsics/extrinsics, matching them against the rendering from OpenCV, where a 20 cm x 10cm box is placed with the front face 40 cm away from the camera. The rendering resulted in a good visual match with OpenCV rendering for the corner points of the box. This establishes the match via Physical Sensor ↔ OpenCV calibration and OpenCV Render ↔ Isaac Sim Render chain.

This rules out serious rendering issues and points that either the simulated cube or camera placement was mismatching the real setup. Or that wrong calibration parameters (e.g. from a camera setting different from the one used to take a real image) were used.

Please, try to use the attached script, setting the quad_width and quad_height parameters to match your physical box (or a piece of paper attached to the box) parameters. If you could provide the measurements for the physical box that you’ve used, I can attempt matching the physical image as well. I would note that it is difficult to achieve good alignment without properly securing and aligning the sensor in the rig and that the expected errors of the setup you’ve outlined could be on the order of 10 pixels or more.

[Please, let me also get back regarding the fStop parameter, as I’ve disabled this parameter for this rendering. I see a mismatch for the depth of the focus when it is enabled and going to investigate it. This parameter doesn’t affect the intrinsics/distortion model and its only relevant to the case when we are modelling the camera aperture size/blurring effects from the depth of the focus].

Thanks!

from omni.isaac.kit import SimulationApp

simulation_app = SimulationApp({"headless": True})

import numpy as np
import omni.isaac.core.utils.numpy.rotations as rot_utils
from omni.isaac.core import World
from omni.isaac.core.objects import DynamicCuboid
from omni.isaac.sensor import Camera
from PIL import Image, ImageDraw

# Given the OpenCV camera matrix and distortion coefficients (Rational Polynomial model),
# creates a camera and a sample scene, renders an image and saves it to
# camera_opencv_fisheye.png file. The asset is also saved to camera_opencv_fisheye.usd file.
width, height = 1280, 800
camera_matrix = [[805.8621215820312, 0.0, 651.9990844726562], [0.0, 805.4034423828125, 412.4564208984375], [0.0, 0.0, 1.0]]
distortion_coefficients = [ -9.155631065368652, 91.02870178222656,  -0.002392938593402505, 0.00018919834110420197,  -58.8187255859375,  -9.20724105834961, 90.6237564086914, -57.472503662109375]
# distortion_coefficients = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# Camera sensor size and optical path parameters. These parameters are not the part of the
# OpenCV camera model, but they are nessesary to simulate the depth of field effect.
#
# To disable the depth of field effect, set the f_stop to 0.0. This is useful for debugging.
pixel_size = 3  # in microns, 3 microns is common
f_stop = 0.0  # 2.0   # f-number, the ratio of the lens focal length to the diameter of the entrance pupil
focus_distance = 0.2  # in meters, the distance from the camera to the object plane
diagonal_fov = 89.5   # in degrees, the diagonal field of view to be rendered

# quad, 0.2 x 0.1 m, 40cm away from the camera
quad_width = 0.2 
quad_height = 0.1
quad_distance = 0.4

# Create a world, add a quad_width x quad_height x 0.002 meter cube, a ground plane, and a camera
world = World(stage_units_in_meters=1.0)
world.scene.add_default_ground_plane()

cube_1 = world.scene.add(
    DynamicCuboid(
        prim_path="/new_quad_1",
        name="quad_1",
        position=np.array([0, 0, 0.001]),
        scale=np.array([quad_height, quad_width, 0.002]),
        size=1.0,
        color=np.array([255, 0, 0]),
    )
)

camera = Camera(
    prim_path="/World/camera",
    position=np.array([0.0, 0.0, quad_distance + 0.002]),  # quad_distance away from the side of the cube
    frequency=30,
    resolution=(width, height),
    orientation=rot_utils.euler_angles_to_quats(np.array([0, 90, 0]), degrees=True),
)

# Setup the scene and render a frame
world.reset()
camera.initialize()

# Calculate the focal length and aperture size from the camera matrix
((fx, _, cx), (_, fy, cy), (_, _, _)) = camera_matrix
horizontal_aperture = pixel_size * 1e-3 * width
vertical_aperture = pixel_size * 1e-3 * height
focal_length_x = fx * pixel_size * 1e-3
focal_length_y = fy * pixel_size * 1e-3
focal_length = (focal_length_x + focal_length_y) / 2  # in mm

# Set the camera parameters, note the unit conversion between Isaac Sim sensor and Kit
camera.set_focal_length(focal_length / 10.0)
camera.set_focus_distance(focus_distance)
camera.set_lens_aperture(f_stop * 100.0)       # ? 
camera.set_horizontal_aperture(horizontal_aperture / 10.0)
camera.set_vertical_aperture(vertical_aperture / 10.0)

camera.set_clipping_range(0.05, 1.0e5)

# Set the distortion coefficients
camera.set_projection_type("fisheyePolynomial")
camera.set_rational_polynomial_properties(width, height, cx, cy, diagonal_fov, distortion_coefficients)

# Get the rendered frame and save it to a file
for i in range(100):
    world.step(render=True)
camera.get_current_frame()
img = Image.fromarray(camera.get_rgba()[:, :, :3])


# Optional step, draw the 3D points to the image plane using the OpenCV fisheye model
def draw_points_opencv(points3d):
    try:
        # To install, run python.sh -m pip install opencv-python
        import cv2

        rvecs, tvecs = np.array([0.0, 0.0, 0.0]), np.array([0.0, 0.0, 0.0])
        points, jac = cv2.projectPoints(
            np.expand_dims(points3d, 1), rvecs, tvecs, np.array(camera_matrix), np.array(distortion_coefficients)
        )
        draw = ImageDraw.Draw(img)
        for pt in points:
            x, y = pt[0]
            print("Drawing point at: ", x, y)
            draw.ellipse((x - 4, y - 4, x + 4, y + 4), fill="orange", outline="orange")
    except:
        print("OpenCV is not installed, skipping OpenCV overlay")
        print("To install OpenCV, run: python.sh -m pip install opencv-python")


# Draw the 3D points to the image plane, a 0.2x0.2 m quad, 40cm away from the camera
draw_points_opencv(points3d=np.array([[quad_width / 2, quad_height / 2, quad_distance], 
                                      [-quad_width / 2, quad_height / 2, quad_distance], 
                                      [quad_width / 2, -quad_height / 2, quad_distance], 
                                      [-quad_width / 2, -quad_height / 2, quad_distance]]))

print("Saving the rendered image to: camera_opencv.png")
img.save("camera_opencv.png")

print("Saving the asset to camera_opencv.usd")
world.scene.stage.Export("camera_opencv.usd")

simulation_app.close()
2 Likes

Hi @dchichkov,

Thanks for taking a deep dive into this issue. I believe you are correct that it is one of, if not both of the issues we are facing. You are right that the box is rectangular, it has the dimensions are 27.5cm x 21.5cm which is a bit bigger than the one you tested with.
If I understand correctly you have looked at the opencv output and compared it to the isaac sim output with the matching camera parameter settings and you are seeing a match there, is that right?

We are not looking for good alignment at the moment but want to make sure we are on the right path before we commit to a more precise setup.

I will try to use the attached script and report back. Looking forward to your comments regarding the fStop parameter.

Hi @dchichkov,

Just tried your code snippet and compared the real images to the simulated ones and the results are really good! Actually they are so good that the deviations in x and y coordinates between simulated and real camera are all less than 1%.

For that reason I am marking your response as the solution, thank you!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.