I have written a 3d lattice boltzmann fluid flow solver but am having problems with the grid and block dimensions. This is my first attempt at a 3d problem so I decided to keep things simple in this instance.
The code for setting the grid/block dims is:
dim3 Db = dim3(nx,1,1);
dim3 Dg = dim3(ny,nz,1);
where nx, ny and nz are the domain dimensions in x y and z respectivly. The kernel is then called with something like:
kernel<<<Dg, Db>>>(arguments);
x y and z coordinates are then recovered within the kernel with:
int x = threadIdx.x;
int y = blockIdx.x;
int z = blockIdx.y;
Now I am aware that this definitely isn’t the most efficient set up, where a better alternative would be to ensure each block consists of a multiple of 32 threads. The problem I have is that when nx > 128 (threads per block > 128) the program completes almost instantaneously with no errors or crashing, however the results are not at all what is expected.
The program actually consists of two kernels, one to initialise the data and one to execute the simulation. Interestingly it appears as though the initialisation kernel executes correctly as the results match the initialisation values as though the simulation kernel was never applied. This behaviour disappears completely with nx < 128 (threads per block < 128)
Have any of you encountered this sort of behaviour before?