I’m writing a cuda program for device with computing capability 2.0, and I defined some 3d grid like
and a 3d block_dim(8,8,8),
and run the kernel as kernel<<<grid_dim, block_dim, shared_memory>>>(arguments);
But the result shows me that only the variables with grid_dim.z=1 are computed, those in grid_dim.z=2 kept unchanged.
I thought 2.0 and 2.1 devices should work on 3d grids, but it seems not true in my example. Anyone has any idea what’s going on?
Look forward to the replies!!! Thanks in advance!