Hello,
i am a great newbie with Cuda and i will start soon my first tests using pycuda with a Jetson Nano.
I need your help to clarify an important point : how calculating Thread index
My tests will be very simple. Mainly, i will make calculus on monochrome image (that is to say a matrix), probably 1 dimension matrix (maybe 2 dimension matrix).
From what i have understand (maybe) :
Jetson Nano gets 128 Cuda cores with single GPU.
Maximum number of threads per multiprocessor : 2048
That is to say maximum threads number will be 2048.
Then, i got to choose :
Number of blocks (one or 2 dimensions)
Number of threads per block (warp size is 32 so i must choose 32, 64, 128 etc. threads per block)
As i will first work on one dimension array (image), i have to calculate idx.
If i choose 1 block and N threads per block :
unsigned int idx = threadIdx.x
If i choose M blocks one dimension (M,1) and N threads per block :
unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x
If i want to work with a 2 dimensions grid :
unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x
unsigned int idy = threadIdx.y + blockIdx.y * blockDim.y
Are those things correct or not ?
Many thx for your help
Alain