Is this the correct way to have index x and index y + z?

I need to have something like:

dim3 grid(66212, 50000,39); 
dim3 blocks(32, 4,  1);

Where index x = 32x66212 and index y = 4x50000x39.

Is this the correct way of getting that?

unsigned int x = threadIdx.x + blockDim.x * blockIdx.x;

unsigned int blockId = blockIdx.z * gridDim.y + blockIdx.y;
unsigned int y = blockId * blockDim.y + threadIdx.y;

If I understand your question correctly, it should be as simple as

unsigned int x = threadIdx.x + blockDim.x * blockIdx.x;
unsigned int y = threadIdx.y + blockDim.y * blockIdx.y;

Check out slide 7 at http://harmanani.github.io/classes/csc447/Notes/Lecture15.pdf

No, you did not understand the question.

I do not want the single x, single y, and single z, I want single x, and y+z because I run out of index using only y.

In my example:

dim3 grid(66212, 50000,39); 
dim3 blocks(32, 4,  1);

If I do:

unsigned int x = threadIdx.x + blockDim.x * blockIdx.x;
unsigned int y = threadIdx.y + blockDim.y * blockIdx.y;

I only get x from 32x66212 which is what I want, but for y I only get 4x50000 = 200000, which I do not want since I need more, so that is why I need the index for 4 x 50000 x 39 = 7,800,000.

[s]Here’s a recent question that seems similar:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

it may give you some ideas.[/s]

Installation guide, is it the wrong link?

yes, sorry

https://devtalk.nvidia.com/default/topic/1069121/cuda-programming-and-performance/3d-grid-and-3d-block-thread-indexing-for-2-nested-for-loops/

Ok so I guess the first option you gave there is what I am looking for:

int idy = threadIdx.y+blockIdx.y*blockDim.y+blockIdx.z*gridDim.y*blockDim.y;

but…,isn’t it equivalent to the following?

int blockId = blockIdx.z * gridDim.y + blockIdx.y;
int idy = blockId * blockDim.y + threadIdx.y;

I didn’t mean to suggest that anything you have is wrong. I haven’t really studied your question.

So are they both correct?

They look the same to me.

Thanks

However, they are not the same, yours has two blockDim.y, so I guess is over-calculated.