3D Thread ID

Codes as follows, THETA is a const, I can define outImage as 1 in GPU function. The expected output should be 1 but they are NaN.

I think that there is an error in 3D thread ID computation. But I don’t know where? So could you tell me?
dim3 dimGrid(iRotated / dimBlock.x , iRotated / dimBlock.y, THETA/dimBlock.z);

rotationD<<<dimGrid, dimBlock>>> (imageRotatedDevice, iRotated, iRotated);

global void roatationD(float* outImage, int widthR, int heightR)

int ix = blockIdx.x * BLOCK_SIZE + threadIdx.x;
int iy = blockIdx.y * BLOCK_SIZE + threadIdx.y; 
 int iz = blockIdx.z * BLOCK_SIZE + threadIdx.z; 

            int idx = ix + iy * widthR + iz * widthR * heightR; // index

             outImage[idx] = 1.0f;


The z component of dimGrid must be 1 on current hardware. Blocks can be 3D, but grids must be 2D. Your kernel probably aborts with an error when you try to run it in this configuration.

Thank you very much. How to fix it? Use loop in Z. I really hope to not use loop.

Heres what i do to handle 3d data:

const dim3 dimBlock(192);

	int dim = ceil(sqrt((float)(DATA_W*DATA_D*DATA_H)/192.0f));

	const dim3 dimGrid(dim,dim);

	convolve<<<dimGrid, dimBlock,0>>>(d_Result);
const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;

	const int z = idx/(DATA_W*DATA_H);

	const int y = (idx - z * DATA_W * DATA_H) / DATA_W;

	const int x =  (idx - z * DATA_W * DATA_H - y * DATA_W);

Theres a bit more overhead, but it works.

Obviously 192 should be a constant somewhere.