Opencv cuda convolution extremly slower than bare cuda code convolution on Jetson Nano using unified memory

Hi,

Since we don’t own OpenCV implementation, you can check this issue with OpenCV developer to get more information.

Based on their code, it seems that they implement convolution through cufft rather than cudnn.
Depends on usecase, to convert the spatial signal to Fourier may not always has gain due to the transformation overhead.

For slow cuDNN issue, this is a known regression from cuDNN v8.
https://forums.developer.nvidia.com/t/darknet-slower-using-jetpack-4-4-cudnn-8-0-0-cuda-10-2-than-jetpack-4-3-cudnn-7-6-3-cuda-10-0/
Our internal team is working on this. Will share you the latest status once we got any update.

Thanks.

1 Like