When it went to execute the kernel, I get this message:
Cuda error: GPU Kernel execution failed in file ‘xxxx’ in line nn : invalid configuration argument.
If I use only 2 dimensions (dimGrid(nx,ny)) it works.
The docs seems to indicate that the grid can be up to 2 dimensions but doesn’t explicitly indicate that fact. The fact that it’s a dim3 seems to imply that it can handle 3 dimensions.
Is this a limitation in the beta 0.8 version or is this going to be a hard limit in the future?
The grid is 2D only. But the thread block is 3D. So you can process volumes as a grid of rectangular blocks. This does get a bit tricky for addressing however if your volume has depth > 512… :huh:
a) tile the depth dimension across the other two dimensions – this is analagous to the “flat 3D textures” approach of “traditional” GPGPU.
B) Loop over depth in the kernel.
/quote]
My 2c:
Both options can be quite a pain wrt addressing when you need to push a say 3x3x3 convolution kernel through the volume. You should consider reformulating what one kernel invocation means, ie. (kernel = volume element to process) but rather (kernel = volume slice). That way you aggregate more work per kernel invocation which might give you the opportunity to share intermediate results. Always a good idea: check if the convolution kernel is separable.
Yeah, I have’t started on it yet, but we’ve got various codes that require 3-D convolutions, thus the reason for some of my interest there. For the simpler code I’ve been working on to date, I’ve been processing one slice of the volume at a time, and that has worked well so far.