what is the best way to implement a convolution with CUDA?

laughingrice · April 1, 2010, 11:58pm

I’m trying to figure out the best way to implement image space convolution on the GPU and guessing that probably quite a few people have done it I was hoping for some pointers.

I’m assuming that the kernel size is not a multiple of 16, not sure what the practical maximum is.
Should I load the kernel into constant, share or texture memory?
It seems to me that it’s better to assign each thread to an output pixel (each thread applies the whole kernel to an image block) rather than let each thread handle a kernel pixel to avoid needing synchronization on write, and thus probably extra shared memory for the output.
It sounds to me that a good options is to load the kernel into constant memory and the image into shared memory, is that a good idea?

thanks

JaredHoberock · April 2, 2010, 1:46am

Joe Stam had a good talk at last year’s GTC on this subject.

Streaming video: [url=“http://nvidia.fullviewmedia.com/GPU2009/1002-california-1401.html”]http://nvidia.fullviewmedia.com/GPU2009/10...ornia-1401.html[/url]
Talk slides: [url=“http://www.nvidia.com/content/GTC/documents/1401_GTC09.pdf”]http://www.nvidia.com/content/GTC/documents/1401_GTC09.pdf[/url]

Topic		Replies	Views
About convolution performance CUDA Programming and Performance	3	767	February 17, 2017
CUDA OpenGL post-processing example CUDA Programming and Performance	9	13241	May 27, 2007
Non-Separable and Non-Linear Image Filter CUDA Programming and Performance	0	934	May 6, 2009
Resize problem CUDA Programming and Performance	1	1836	February 19, 2020
Image Convolution CUDA Programming and Performance	1	1981	December 30, 2008
Low occupancy ratio using texture memory Image correlation using texture memory CUDA Programming and Performance	2	4692	September 20, 2008
Non uniform Convolution CUDA Programming and Performance	1	1929	April 7, 2008
Help with some CUDA concepts CUDA Programming and Performance	7	1447	August 16, 2009
Need help coding a convolution CUDA Programming and Performance	1	3387	September 8, 2008
when to use shared memory CUDA Programming and Performance	0	2259	March 10, 2009

what is the best way to implement a convolution with CUDA?

Related topics