Arbitrary 2D convolution

brdavs · February 10, 2012, 3:18am

What is currently best open source
arbitrary 2D convolution implementation
in CUDA?

The kernel is non-separable.

Thanks for any suggestions.

Pascal_Obe · February 10, 2012, 9:57am

Could you post an example of 2D convolution maths formulas? I guess you can use CUBLAS optimized libraries to do that kind of operations. I guess you can see a 2D convolution problem equivalent to a 2D reduction problem on an array?

Pascal

brdavs · February 10, 2012, 7:22pm

There is a great presentation on convolutions by Joe Stam:

I just don’t know if this is current state-of-the-art
or is there something else out there.

MattWarmuth · February 16, 2012, 7:17pm

When you say ‘best open source arbitrary 2D convolution implementation,’ you have to be careful. The ‘best’ arbitrary convolution solution that handles all kernel sizes will certainly be worse than one that can say, fit into shared memory. Also, at some point, the number of ops pushes you to do the convolution in frequency space via an FFT. There is no “best … arbitrary”, unless it looks at the size, looks at your compute capability to determine what it can store in shared memory, then possibly runs a sample FFT (after planning it) for your specific size and then compares the timing to a shared vs. texture based before it decides which path it’s going to take. I’m not saying this can’t be done, but there’s no ‘best’ solution that handles all sizes. The relative performance of several methods will vary widely (or perhaps not even work) with different sized convolutants (? a Bush-ism?).

I found an interesting paper out there that sort of delves into this problem and draws the line at 31-41 pixels as the transition between a texture based image space convolution and a frequency spaced FFT solution. Unfortunately, the paper is 6 years old, so it’s most likely out of date (especially since it’s talking about 6000 and 7000 series chips). However, the discussion inside is still relevant.

Here’s another paper that compares CPU’s (with SSE), GPU’s and some Xilinx FPGA’s, but it’s also old (5000/6000 series and Spartan/Virtex II series). This one directly compares the throughput of each system in MP/s for various square kernels (when you say ‘arbitrary,’ did you mean ‘arbitrary sized square kernels’, or can the kernels be rectangular…?), but only from 2x2 → 11x11 (this is probably limited by memory somewhere, or resource availability in the FPGA).

pasoleatis · February 17, 2012, 11:57am

DO you plan to do it real space or inverse space. The short range are effective in real space while the long range in inverse. space with FFT.

Topic		Replies	Views
General Convolution CUDA Programming and Performance	7	2916	April 21, 2009
2D convolutions with changing kernel but fixed data CUDA Programming and Performance	5	8322	December 16, 2008
Simple 2d Convolution Low Pass filter like blur filter CUDA Programming and Performance	3	2819	April 15, 2014
CUDA OpenGL post-processing example CUDA Programming and Performance	9	13245	May 27, 2007
3d convolutions and correlations Any experience with 3d filtering? CUDA Programming and Performance	3	8841	October 4, 2007
2D CUDA convolution CUDA Programming and Performance	3	16153	May 2, 2016
Cuda Convolution - best memory useage CUDA Programming and Performance	3	900	March 30, 2012
2D cross correlation CUDA Programming and Performance	11	26046	May 19, 2011
About convolution performance CUDA Programming and Performance	3	771	February 17, 2017
Branch divergence, Boundary element exchange Optimization and best practices CUDA Programming and Performance	9	18556	December 13, 2007

Arbitrary 2D convolution

Related topics