I want to set up a stereo camera so I can simulate the disparity. I think in an older version there was a disparity sensor but I can’t seem to find it now.
What I have done so far is set up two identical cameras, a left and a right. I have offset the right camera and the cross linked the two by giving their prim names.
Is this all I need to do to get a disparity sensor?
And if so, how do I display the results so I can check the output prior to creating the simulated data?
I was hoping that the disparity ground truth could be generated much the same way we can generate semantic segmentation mask rather than a estimate. Is there plans to add ground truth disparity outputs?
If setting the cameras as ‘stereo’ doesn’t give provide a disparity output can you explain the purpose of the cross-linking them? What is the difference between doing the cross linking and having two unlinked cameras if I need to calculate the disparity separately anyway? Especially since I can get the distance to the camera either as a separate output.
In the case of the real stereo camera, most of the time stereo is used to estimate depth. And the depth estimation pipeline follows these steps:
Stereo Input [2 x RGB images, with lens/camera distortions]
Rectification [2 x RGB images]
Stereo matching [Disparity image]
Depth estimation [Depth image]
For a simulated camera sensor, like the ones supported in Isaac Sim, following outputs are normally available:
Rectified camera output: RGB and Depth
Simulated camera output [RGB, with lens/camera distortions]
In the case of the stereo camera, two sets of such images are available. Normally, this is sufficient to re-create the images at every step of the stereo-rectification-disparity-depth pipeline. For example:
Noisy depth or disparity image can be generated by passing the simulated camera output [RGB, with lens/camera distortions] through the regular pipeline above.
Disparity ground truth image can be generated from the simulated depth image by applying disparity = (baseline * focal length) / depth).
Do you think it’s correct to generate the synthetic ground truth data by creating 3 cameras?
A pair of cameras for grabbing stereo RGB images (L and R)
One more camera right at the midpoint (in between the 2 cameras mentioned above) for grabbing depth data.
As far as I understand, the depth calculated from the disparity formula that you’ve provided is relative to the “midpoint camera” coordinate systems, not to L or R camera.
yes, setting up a third depth annotator camera in the midpoint should work. Be aware that there are two types of depth annotators: distance_to_camera and distance_to_image_plane from which you should probably use the latter.
When setting up the third camera make sure you have the optical axis aligned with the stereo cameras and the intrinsic parameters are the same (focal length, etc.).