const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;
const int k = idx/(ginfo.x_dim*ginfo.y_dim);
const int j = (idx - k * ginfo.x_dim * ginfo.y_dim) / ginfo.x_dim;
const int i = (idx - k * ginfo.x_dim * ginfo.y_dim - j * ginfo.x_dim);
Lots of operations but if you have a large enough kernel, it should become irrelevant.
One easy way to improve on this would be to store 1.0f/(width*height) and 1.0f/(width) in constant memory, replace 2 divisions and 1 multiplication with 2 multiplications…
What is the best and simplest way of using blockIdx and threadIdx to get indices for a 2D or 3D grid?
The SDK example FLuidsGL has a workable structure for a 512x512 2D grid. I find it esthetically inelegant, but it works, it’s already debugged, and it has a convenient emulation-mode option.