Hi, I haven’t seen this posted so I thought I would post it. It’s a simple extension of the 2D texture based separable convolution that is found in the SDK. It would be a good exercise for someone who is new to CUDA to do possibly. I chose the textures for ease of programming/readability. I am aware that use of shared memory could be faster.
Currently it reads the convolution kernels from a Matlab .mat file, so you may need to change the where it looks for the Matlab header/library files on your computer, or it is easy to comment that part out, and use a random kernel as in the SDK.
I have not tried to optimize the block size/thread count/occupancy, mainly it’s a proof of principle. I see speed ups of ~180x to comparable single threaded CPU code on my GTX 260 using image size of 2048x512x64 (67,108,864 voxels).
seperableConvolutionTexture3d.zip (227 KB)