3D texture based separable convolution extension of SDK example

fna · March 18, 2010, 5:47pm

Hi, I haven’t seen this posted so I thought I would post it. It’s a simple extension of the 2D texture based separable convolution that is found in the SDK. It would be a good exercise for someone who is new to CUDA to do possibly. I chose the textures for ease of programming/readability. I am aware that use of shared memory could be faster.

Currently it reads the convolution kernels from a Matlab .mat file, so you may need to change the where it looks for the Matlab header/library files on your computer, or it is easy to comment that part out, and use a random kernel as in the SDK.

I have not tried to optimize the block size/thread count/occupancy, mainly it’s a proof of principle. I see speed ups of ~180x to comparable single threaded CPU code on my GTX 260 using image size of 2048x512x64 (67,108,864 voxels).
seperableConvolutionTexture3d.zip (227 KB)

wanderine · April 6, 2010, 1:12pm

I tried your code and it works but for me about 50% of the time is spent on copying the filter response back to the texture between the convolutions, since it is not possible to write to texture memory. I implemented a separable 3D convolution with shared memory and it is about double as fast. With Fermi it will be even faster since then more blocks can run at the same time (48 KB of shared memory instead of 16 = 3 times faster?), while I think that the performance of the texture based version will be about the same since it is memory bound and the memory bandwidth will not increase that much for the global memory in Fermi.

Topic		Replies	Views
CUDA OpenGL post-processing example CUDA Programming and Performance	9	13261	May 27, 2007
General Convolution CUDA Programming and Performance	7	2920	April 21, 2009
Question about texture/shared memory enhance the computing efficiency CUDA Programming and Performance	3	5384	December 4, 2007
question about matrixMul and convolution sdk examples CUDA Programming and Performance	1	726	May 4, 2011
Texture Memory vs Shared Memory.... CUDA Programming and Performance	3	5509	January 8, 2015
Help: Shared memory vs. Caching in ConvolutionSeparable Example CUDA Programming and Performance	1	4478	December 7, 2008
3d convolutions and correlations Any experience with 3d filtering? CUDA Programming and Performance	3	8843	October 4, 2007
Simple 2d Convolution Low Pass filter like blur filter CUDA Programming and Performance	3	2842	April 15, 2014
Shared Memory usage slows kernel with texture fetch CUDA Programming and Performance	8	4154	June 20, 2011
CUDA Image Processing Demo & Soure code&Tutorials CUDA Programming and Performance	7	25048	April 2, 2007

3D texture based separable convolution extension of SDK example

Related topics