Fast GPU convolution reference


I’m looking for the fastest available 2D convolution for 32-bit floating point to benchmark some of my own code against. Does anyone have any suggestion? Benchmark results?

I noticed NPP has routines for this, as I’m sure there ar MANY other implementations available. Can anyone give me some pointers?


Silly question - did you try the SDK sample? how did your code stood against that one?


I run codes which perform in the intermediate steps convolutions. I do it in k (inverse) space using cufft libraries. It is easy to implement and very efficient if the range of the convolution is large, since you reduce everything to 3 fft (1 forward and 1 backwards) and a matrix-matrix multiplication (element wise).

Jim’s code always seems to be the fastest IMO, so please update us to your findings.