3D threads index How do you index 3D threads?!!!

I am trying to launch a kernel with 3D threads. I have an array of N elements, where each element is a 2D array. I want to process this array on cuda. Now, if I will launch a 2D kernel, I will have to have a for loop of count N to process each element of the array. Now, I want to eliminate the for loop by using 3D threads.

2D threads are launched and indexed in the following manner:

Launching:

[codebox]

dim3 dimBlock(16, 16, 1);

dim3 dimGrid(cuiDivUp(sizex, dimBlock.x), cuiDivUp(sizey, dimBlock.y), 1);

Kernel<<< dimGrid, dimBlock >>>();

[/codebox]

Indexing:

[codebox]

unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;

unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

[/codebox]

Since grids can only have 2D blocks, I cannot make the following:

[codebox]

dim3 dimBlock(8, 8, 4);

dim3 dimGrid(cuiDivUp(sizex, dimBlock.x), cuiDivUp(sizey, dimBlock.y), cuiDivUp(sizez, dimBlock.z));

Kernel<<< dimGrid, dimBlock >>>();

[/codebox]

What is the right procedure for indexing 3D threads?

please see the thread [url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtop...rt=#entry591025[/url]

“Since grids can only have 2D blocks, I cannot make the following:”

You mean since grids can only be 2 dimensional. I would manipulate one dimension of the grid and make pretend its 2D…

From the top of my head ( i dont have a paper to check against right now External Image ):

dim3 grid(d_x, d_y*d_z);


|-------- | Z

--------
Y

=>

block_idx_x = blockIdx.x;

block_idx_y = blockIdx.y%d_y;

block_idx_z = blockIdx.y/d_y;

Or something like that :)

EDIT: LSChiens thread seems to be very thorough on the subject

Many thanks, both of you External Media