Speed comparison between CUDA and OpenCV


I’m trying to do a comparison between running a Gaussian Blur algorithm with CUDA and OpenCV. I’m using Visual Studio 2015 and CUDA Toolkit V8 to run things.

My GPU is a NVidia GeForce 840M and my CPU is a Intel Core i7-4510U @ 2Ghz. I’m also having 16GB of RAM and 512 GB SSD, but I doubt this is useful information. Yes I am running Visual Studio on a laptop. Main OS is Windows 10.

I tried doing a comparison today between OpenCV and CUDA and it stuck me that the OpenCV program ran in 300 milliseconds while the CUDA one ran in 1500 milliseconds.

I added counters to strictly before/after each filter. Should it be because of the time data is transferred between CPU/GPU?

Here are the algorithms I took as reference for this comparison:


Is this a GT 840M? If so, a lowest-end device with DDR3 memory. I would not expect any speedup versus the CPU because I expect Gaussian blur to be limited by memory throughput. This GPU likely has memory throughput similar to your system memory. If your measurements include the time to transfer the data between CPU and GPU, expect slowdown. Your CPU does support boost clocks up to 3.1 GHz, not sure whether that would kick in when running this code.

Have you tried turning on the GPU acceleration in OpenCV? Comparing two random implementations of Gaussian blur doesn’t really allow one to draw any conclusions as to relative performance of two pieces of hardware., or two software platforms for that matter.


gaussian blur requires so small amount of computations that even on cpu it’s speed is limited by memory throughput, it should be as fast as simple memcpy()

when you try to offload such computations to gpu, you spend much more time copying data forth and back over pci-e bus, even if you have decent gpu with fast memory

so, if you need to perform ONLY gaussian blur on these data, using discrete GPU cannot help