results of cuda example 'stereoDisparity' is better with running another cuda example

Hi.

I runned CUDA stereoDiparity example on
TX2 with Linux tegra-ubuntu 4.4.38 and L4T r28.1.

The result of running stereoDisparity alone is below “result 1”
showing that GPU prossing time is about 68 ms.

And The result of running stereoDisparity with HSOpticalFlow is “result 2”
showing that GPU processing time is about 16 ms.

Please let me know the reason
why stereoDisparity shows 4 times better performance when HSOpticalFlow.

Thanks in advance.

######################################
########### result 1 ################
######################################

nvidia@tegra-ubuntu:~/cuda/NVIDIA_CUDA-8.0_Samples/3_Imaging/stereoDisparity$ ./stereoDisparity
GPU Device 0: “NVIDIA Tegra X2” with compute capability 6.2

GPU device has 2 Multi-Processors, SM 6.2 compute capabilities

Loaded <./data/stereo.im0.640x533.ppm> as image 0
Loaded <./data/stereo.im1.640x533.ppm> as image 1
Launching CUDA stereoDisparityKernel()
Input Size [640x533], Kernel size [17x17], Disparities [-16:0]
GPU processing time : 68.5950 (ms)
Pixel throughput : 4.973 Mpixels/sec
GPU Checksum = 4293895789, GPU image: <output_GPU.pgm>
Computing CPU reference…
CPU Checksum = 4293895789, CPU image: <output_CPU.pgm>
nvidia@tegra-ubuntu:~/cuda/NVIDIA_CUDA-8.0_Samples/3_Imaging/stereoDisparity$

######################################
########### result 2 ################
######################################

nvidia@tegra-ubuntu:~/cuda/NVIDIA_CUDA-8.0_Samples/3_Imaging/stereoDisparity$ ./stereoDisparity
GPU Device 0: “NVIDIA Tegra X2” with compute capability 6.2

GPU device has 2 Multi-Processors, SM 6.2 compute capabilities

Loaded <./data/stereo.im0.640x533.ppm> as image 0
Loaded <./data/stereo.im1.640x533.ppm> as image 1
Launching CUDA stereoDisparityKernel()
Input Size [640x533], Kernel size [17x17], Disparities [-16:0]
GPU processing time : 16.2022 (ms)
Pixel throughput : 21.054 Mpixels/sec
GPU Checksum = 4293895789, GPU image: <output_GPU.pgm>
Computing CPU reference…
CPU Checksum = 4293895789, CPU image: <output_CPU.pgm>

Hi,

By default, simple stereo disparity kernel performs a basic block matching scheme.
You can find implementation detail here:
/usr/local/cuda/samples/3_Imaging/stereoDisparity/stereoDisparity_kernel.cuh

For HSOpticalFlow, it is a Horn-Schunck method for optical flow written using CUDA.
Here is its description: Horn–Schunck method - Wikipedia

Thanks.

Did you run ./jetson_clocks.sh before these two benchmarks?

I suspect you’re running the TX2 at the “low-performance” clock rates, and the bandwidth automatically opens up a bit when you run the second process.

I would recommend running the CPU, GPU and EMC at max clock rates with the ./jetson_clocks.sh command and then benchmarking the stereo code again.