idz question

I am new to CUDA. I know grid is 2D and thread could be 3D.
I want to write a device function to compute the result:

idx = blockIdx.x * blockDim.x + threadIdx.x;
idy = blockIdx.y * blockDim.y + threadIdx.y;
idz = ?
result [index] = a[idx] + b [idy] + c[idz];

index is the 1D index structure of the resulting 3D data, it’s computed by idx, idy and idz
idx, idy and idz relates to the threadIdx

Is there any easy way to do it without breaking down the y-z relationship into 1D?

unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;

unsigned int idy = blockIdx.y * blockDim.y + threadIdx.y;

for (unsigned int idz = threadIdx.z; idz < num_z; idz += blockDim.z)

  result [index] = a[idx] + b [idy] + c[idz];

Is it num_z the total number of 3D in z direction?
If I understand the code right, it is taking care of the [idx, idy, all idz%blockDim.z==threadIdx.z] in one call.
That’s very insightful. :thumbup:

Many Thanks!!

A second thought on shared memory…
Since all the data in c are accessed in one thread block, should I load the whole c into shared memory if that is big enough?

Yeah, num_z is the total of elements in z direction. Like gridDim.x * blockDim.x for the x direction.

As you are using each value in c only 1 time, there is no use in loading c into shared memory. idx,idy and idz are different for all threads.

See if my post here http://forums.nvidia.com/index.php?showtopic=68489#
is helpful