parsing Stereo Disparity output (nvmedia_imageofst) as a disparity map

shengliang.xu · January 28, 2020, 7:47pm

Hi Nvidia,

Follow up on the post asking for help in computing stereo disparity:

https://devtalk.nvidia.com/default/topic/1070229/general/how-to-correctly-compute-stereo-disparity-nvmedia_imageofst-/

How to correctly parse the result as a disparity map?

The result from the nvmedia_imageofst sample application for computing the disparity is of int16 type, and it looks like a vector of all non-positive numbers.

What is the correct way to convert the result into a disparity map array?

negate the negative numbers? re-normalize in some way?

Thanks.

VickNV · January 28, 2020, 8:30pm

Hi shengliang.xu,

Doesn’t the result in https://devtalk.nvidia.com/default/topic/1070229/general/how-to-correctly-compute-stereo-disparity-nvmedia_imageofst-/post/5423217/#5423217 already look good?

shengliang.xu · January 28, 2020, 8:39pm

Hi Vick,

I need the disparity map in unit of pixels. The result in the other post is not on the correct unit for sure given all numbers are non-positive.

The ground truth of the tsukuba sample stereo pair is:

The result from the nvmedia_imageofst has correct segmentations but the disparity values (relative grayscale) is not correct:

External Media

VickNV · January 28, 2020, 8:52pm

Is this just a matter of color mapping of imshow? Could you check https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html to reverse it? Thanks!

shengliang.xu · January 28, 2020, 11:18pm

Hi Vick,

It’s not a coloring issue.

Each value (d) of a disparity map with respect to the left image at (x,y) has the meaning of the number of pixel offset of the same point of interest from left image (at (x,y)) to the right image (at (x, y+d)). So I need the numbers in unit of pixels under this semantic.

I’ve searched a bit more on Google and find this piece of code/comment

/**
* \struct NV_OF_STEREO_DISPARITY
* Struct needed for stereo /disparity. ::NV_OF_OUTPUT_EXECUTE_PARAMS::outputBuffer will be populated
* with stereo disparity in ::NV_OF_STEREO_DISPARITY format for each ::NV_OF_INIT_PARAMS::outGridSize.
* Stereo disparity is a 16-bit value with the lowest 5 bits holding fractional value,
* followed by a 11-bit unsigned integer value.
*/
typedef struct _NV_OF_STEREO_DISPARITY
{
    uint16_t                        disparity;    /**< Horizontal displacement[in pixels] in 11.5 format. */
} NV_OF_STEREO_DISPARITY;

in this NVIDIAOpticalFlowSDK github repo:

github.com

NVIDIA/NVIDIAOpticalFlowSDK/blob/master/nvOpticalFlowCommon.h

/*
* Copyright(c) 2020, NVIDIA CORPORATION.All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met :
*
* 1. Redistributions of source code must retain the above copyright notice, this
*    list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
*    this list of conditions and the following disclaimer in the documentation
*    and / or other materials provided with the distribution.
*
* 3. Neither the name of the copyright holder nor the names of its
*    contributors may be used to endorse or promote products derived from
*    this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE

This file has been truncated. show original

It feels like a reasonable way of coding the disparity in 16bits, but after a simple experiment, I find parsing the result from the DRIVE OFST API using this coding is not correct.

I’m quite confused.

VickNV · January 29, 2020, 1:58am

I guess all negative is because of our implementation always using right image as the reference frame (see NvMediaIOFSTProcessFrame). I’ll check further and get back to you here.

shengliang.xu · February 3, 2020, 5:54pm

Hi Vick,

Any update on this issue?

Thanks.

VickNV · February 4, 2020, 2:27pm

Hi shengliang.xu,

Yes, in the current version stereo disparity values are all negative. It is basically motion vector in left direction. So you can take absolute value of the output to get disparity.

We will try to fix this issue. Thanks!

shengliang.xu · February 4, 2020, 6:48pm

Thank you Vick.

So then the two steps to get the disparity is

negate the values
shift each value by 5 bits to get the top 11 bits of the value

right?

But I find the values much smaller than what I get using opencv.

So my guess is the result disparity values is scaled down in some way.

Given the result disparity array size is scaled down from the input images by 4 (original image WxH → result array W/4 x H/4). Are the disparity values in the result array computed on the scale of W/4xH/4 (i.e. disparity block offset instead of pixel offset), therefore we should scale the result values up by some number 4 or 16 to get pixel offsets?

VickNV · February 5, 2020, 1:41pm

Yes, getting absolute value and shifting right by 5 should give disparity in pixels.

How did you compare to opencv? StereoBM does subpixel refinement. Could you take a look and check if opencv output is scaled? Thanks!

shengliang.xu · February 5, 2020, 6:01pm

Previously I was using pyplot to draw the gray scale disparity images, it was not correct because pyplot seems to normalize the values by itself. Now I have the results saved by the raw disparity numbers as images. Here’s the results:

The ground truth:
External Media

The opencv result:
External Media

The xavier drive result by

negate the values
shift each value by 5 bits to get the top 11 bits of the value
resize the result to the size of the original image :

External Media

As you can see the values are apparently not on the correct scale.

tsukuba.disparity.png
tsukuba.disparity.opencv.png

VickNV · February 5, 2020, 7:05pm

Thank you for the information!

Please provide all the details about how to get the ground truth, how to generate the disparity map from opencv with arguments and how to generate from nvmedia_imageofst (I preassume using the same command at https://devtalk.nvidia.com/default/topic/1070229/general/how-to-correctly-compute-stereo-disparity-nvmedia_imageofst-/post/5423217/#5423217, right?).

shengliang.xu · February 5, 2020, 7:34pm

ground truth, it’s available on the middlebury stereo dataset website:

http://vision.middlebury.edu/stereo/eval/newEval/tsukuba/groundtruth.html

left input image:

right input image:

The opencv code:

import numpy as np
import cv2

imgL = cv2.imread('tsukuba_left.png', cv2.IMREAD_GRAYSCALE)
imgR = cv2.imread('tsukuba_right.png', cv2.IMREAD_GRAYSCALE)

stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL, imgR)

cv2.imwrite('tsukuba.disparity.opencv.png', disparity)

nvidia xavier disparity generation is using the command at the other post you’ve linked. The post processing python code is:

import cv2
import numpy as np

fd = open('tsukuba.disparity', 'rb')
rows = 72
cols = 96

f = np.fromfile(fd, dtype=np.int16, count=rows*cols)
fd.close()

f = np.fromiter((((-xi)/32) for xi in f), f.dtype)

disparity = f.reshape((rows, cols))

resized = cv2.resize(disparity, (cols * 4, rows * 4), interpolation = cv2.INTER_AREA)

cv2.imwrite('tsukuba.disparity.nvidia.png', resized)

VickNV · February 6, 2020, 3:12pm

Thanks! We will check internally and get back to you here.

shengliang.xu · February 6, 2020, 7:56pm

Hi Vick,

I find the problem. Sorry my bad.

OpenCV StereoBM actually also has fixed point float number format, the lower 4 bits are fractional. So the integral part of the result from the Nvidia drive xavier indeed needs to scale up by 16 to have comparable results against opencv (or better, scale down the original result by 2).The groundtruth seems to have the same binary format as the opencv StereoBM. This confused me.

I think everything is almost clear now. But I’ll try more samples before closing this issue

Thank you.

VickNV · February 6, 2020, 8:11pm

Thanks for sharing the information! I think you are talking about the disparity decription in https://docs.opencv.org/master/d2/d6e/classcv_1_1StereoMatcher.html#a03f7087df1b2c618462eb98898841345.

disparity	Output disparity map. It has the same size as the input images. Some algorithms, like StereoBM or StereoSGBM compute 16-bit fixed-point disparity map (where each disparity value has 4 fractional bits), whereas other algorithms output 32-bit floating-point disparity map.

shengliang.xu · February 7, 2020, 2:32am

yes, thanks.

I’ve tried dozens of more samples, the result in general looks promising, but the disparity computation deterministicly fail at some magical image size in Linux.

I’ll open a different post for this issue.

shengliang.xu · February 7, 2020, 2:44am

I’m summarizing all that I’ve found here to close this series:

The IOFST API result is of size: width: ((input_img.width + 15)/16) * 4 height: ((input_img.height + 15)/16) * 4
The IOFST API result is an array of int16 type; all non-positive. Each value is a fixed point float, with the low 5 bits fractional, high 11 bits integral. Each disparity value can be decoded by -v/32.0, where v is any value in the returned array.

VickNV · February 7, 2020, 5:04pm

Thanks for the summary! still some to clarify with you.

The output surface contains 4×4 downsampled MV’s. The size of each MV is:
Stereo Disparity: 2 bytes (MVx)

Why do you divide by 16?

Yes, according to different format:
NV OFST output - divide by 32
OPENCV output - divide by 16
Middlebury ground truth - divide by 8

shengliang.xu · February 7, 2020, 5:41pm

Sorry, typo, fixed. The size according to the code should be:

width: ((input_img.width + 15)/16) * 4 height: ((input_img.height + 15)/16) * 4

Topic		Replies	Views
how to correctly compute Stereo Disparity? (nvmedia_imageofst) DRIVE AGX Xavier General	8	1198	April 7, 2020
SGBM VisionWorks - Black Disparity Image Jetson Xavier NX visionworks	9	1021	October 18, 2021
disparity issue Jetson TX2	31	3261	October 18, 2021
VPI Stereo Disparity background (sky) noise Jetson AGX Xavier vpi	22	2073	March 29, 2023
Disparity Help Jetson TX2	13	3348	October 18, 2021
Converting mat to vx_image and back Jetson TX1	11	3438	October 18, 2021
Stereo disparity block matching using Vision Accelerator DeepStream SDK opencv , cuda	12	1054	October 12, 2021
Segmentation fault when using nvds_obj_enc_process DeepStream SDK	9	521	April 26, 2024
VPI Stereo Disparity Real World Results Jetson Nano nvbugs , vpi	7	1332	March 2, 2022
How to switch between different video sources and zoom in to full screen on Sink DeepStream SDK deepstream	13	112	November 6, 2024

parsing Stereo Disparity output (nvmedia_imageofst) as a disparity map

Related topics