NPP 2D Convolusion

I am looking for a way to do 2D convolution for dimensions up to 10,000 x 10,000.

I just came across nppiFilter_8u_C1R and have a couple basic questions:

  1. Are there any dimension limits that should generally not be exceeded (will 10k x 10k be to big?) I am using Tesla C2075.

  2. Is it really doing some sort of FFT/DFT convolution stuff under the hood? Would it be better to use cuFFT and skip NPP? Anyone have done performance comparisons?

  3. If I were really dealing with a matrix of float values rather than processing image pixels, any tricks suggested populate the source and kernel “on the fly”?

the size of your matrix will take 100size(type_of_dat) MB,. To this you need to add the filter and the additional stuff required if fft are used (2.6 size of problem). If you user double you get around 3GB. So if you do not do anything else you can fit the problem on the gpu.

We’ve done lot’s of benchmarks and ArrayFire’s convolutions are the fastest available. It is free for most users. Check out the features available. Also contains a big set of image and signal processing functions.