Disparity Help

We have stereo cameras (IMX185) and are using Argus to retrieve frames, OpenCV to remap them, and visionworks nvxSemiGlobalMatchingNode to calculate the disparity.

This works great when the scene is stationary, but as soon as there is movement the disparity fails. If you stand in front of the cameras and move side to side you can see yourself flashing black and white. If I’m not mistaken this sort of flashing is caused by a time delta between the left and right captures. However the timestamps for both Argus captures (using the method in the syncSensor sample) are identical.

Here is a video of our output:

In the beginning you can see one of my coworkers moving through the frame in the background, then I move the camera fixture side to side, and then up and down which causes the disparity to completely break down.

Does anyone have ideas of what is going on? Is it simply a camera sync issue or could it be something else? If it is just a camera sync issue is it not possible to do disparity with just software sync?


hello Atrer,

I move the camera fixture side to side, and then up and down which causes the disparity to completely break down.

it’s hard to tell what’s the issue exactly is during 00:03-00:06,
did you moving the whole device for continuous scene change?
suggest you might also share another video recording (without OpenCV calculation) to show what you had done.

Good idea, I’ll work on that!

I used OpenCV to create a grid view with the following:

  • Top left - left camera feed
  • Top right - right camera feed
  • Bottom left - cv::absdiff of the left and right feeds after cv::remap
  • Bottom right - disparity output

Here is the video. I enlisted a co-worker to help demonstrate the problem.

The images are obviously not sync’d at all and you can see one stream lagging behind the other. Is there any way to improve that sync? As far as Argus is concerned the timestamps from CaptureMetadata are the exact same nanosecond. Do we need to move forward on setting up hardware sync for our cameras?


It occurs to me that I since we can perceive the desync my first step would be to hook a monitor up and run the syncSensor sample to figure out if this is an issue before or after Argus. I’ll do that tomorrow.

If I can still perceive a time delta between the frames then I know I need to look at the ISP or camera itself.

If the frames seem to be in sync I know I have some sort of buffering issue between when I pull the NvBuffers and when I trigger the visionworks graph.

hello Atrer,

you should have both left/right cameras synchronize for correct results. you may enable hardware sync pin to sync both sensor frames.
however, there’s software approach, please check the syncSensor sample.
this example using multi-sensor per single session, it’ll duplicate single capture request to dual-camera. the capture results should be close enough,
in addition, you may also check getSensorTimestamp() to enable the timestamp comparison,

I am using code from the syncSensor sample to create a synchronized capture session. The OutputStreams are going into FrameConsumers. After I pull the frames I call IImageNativeBuffer::createNvBuffer to get the dmabuf_fd for each frame and pass those through two parallel instances of NvVideoConverter to convert from YUV420M to ABGR32.

In my processing loop I pull the converted frames, copy and re-queue the MMAP buffer, and get the CUeglFrame of the copy. I then wrap the CUeglFrame.frame.pPitch[0] with a cv::cuda::GpuMat and use cv::cuda::cvtColor(CV_RGBA2GRAY) and cv::cuda::remap.

My visionworks graph is identical to the one in the stereo_matching demo with nvx_cv::createVXImageFromCVGpuMat used to get the vx_image from my remapped cv::cuda::GpuMat.

I agree that there seems to be something going on in my capture or the pipeline. As you suggested I was already pulling timestamps with getSensorTimestamp for both frames, and they are identical.

The final solution was to not use a sync’d capture session like in syncSensor, but to use two independent capture sessions and manually match the timestamps. This was suggested by JerryChang here:

It would seem that the “software sync” session does not guarantee that the frames will be at the same time, even with hardware sync.


hello Atrer,

cool, that seems promising depth maps.
thanks for sharing.

Atrer, would you mind sharing some of your code for this?

There are quite a few steps involved, and our use case is rather specific; is there something you are trying to get working?

We’re using:

  1. OpenCV's fisheye camera model for stereo calibration
  2. Argus for pulling frames
  3. NvVideoConverter for scaling (and to go from DMABUF -> MMAP)
  4. CUDA for rectification
  5. Visionworks for disparity
  6. NvVideoEncoder for H265
  7. Gstreamer for streaming

Thanks for replying. Right now I’m using a StereoPI (www.stereopi.com) which is a carrier board for a Raspberry Pi Compute Module which gives two CSI connections and supports software based stereo video synchronization. The problem is that it won’t do more than about 640x480. So I’m considering the NVIDIA Jetson Nano with a carrier board that supports two CSI cameras.

My use case is to install this into cars as a dashcam and to capture stereo video to feed into an ML model for image classification, segmentation, and separately depth disparity. I may also do active learning inference on device to try to identify the most interesting video frames to label for later training.

If I were to start a project like that I would look into Nvidia’s DeepStream SDK. We’re still using L4T r28.2 and visionworks which is no longer supported (or even installed by Jetpack). I think our biggest hurdle was figuring out the calibration/rectification process for our cameras. Luckily there were lots of good examples to use with OpenCV.

Once you have those values you can use NvVideoConverter to get Argus’ DMA buffers into device memory so that you can run OpenCV’s cuda kernels on them. There’s also jetson-inference once you’re ready for that step.

So first step would be getting a Disparity example running on live video using Gstreamer, it will probably be pretty bad since you won’t have any calibration. Once that’s done you can replace your pipeline’s source with an Argus app that handles the OpenCV stuff before pushing it into an appsrc. Nvidia has an example of that in their l4t-multimedia-samples.

Once you’re ready for help with stuff I’d recommend making a new forum topic, feel free to PM me too.