OpenCV Cuda DFT extremely slow

In the future, please use the tools in the editor window to format your code correctly.

Looking at your output, I would say each DFT seems to be taking ~800ms, not 80ms.

I’m not that familiar with the internals of OpenCV. Its not maintained or supported by NVIDIA, and although it uses CUDA under the hood, this is not a CUDA programming question.

An application that uses cuModuleLoad is using the driver API under the hood. After the first time you run that function, subsequent calls to that function for the same module (presumably the case here) should be able to pull what is needed out of the JIT cache, and not recompile anything. So the first thing I would check is whether your JIT cache is working correctly. There are various questions on these forums about the JIT cache. (I would also want to be certain that if/when you built your OpenCV CUDA libraries, you did not inadvertently specify the -G switch, although this seems unlikely.)

2 Likes