Hello,

i am a great newbie with Cuda and i will start soon my first tests using pycuda with a Jetson Nano.

I need your help to clarify an important point : how calculating Thread index

My tests will be very simple. Mainly, i will make calculus on monochrome image (that is to say a matrix), probably 1 dimension matrix (maybe 2 dimension matrix).

From what i have understand (maybe) :

Jetson Nano gets 128 Cuda cores with single GPU.

Maximum number of threads per multiprocessor : 2048

That is to say maximum threads number will be 2048.

Then, i got to choose :

Number of blocks (one or 2 dimensions)

Number of threads per block (warp size is 32 so i must choose 32, 64, 128 etc. threads per block)

As i will first work on one dimension array (image), i have to calculate idx.

If i choose 1 block and N threads per block :

unsigned int idx = threadIdx.x

If i choose M blocks one dimension (M,1) and N threads per block :

unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x

If i want to work with a 2 dimensions grid :

unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x

unsigned int idy = threadIdx.y + blockIdx.y * blockDim.y

Are those things correct or not ?

Many thx for your help

Alain