Calculate Thread index with Grid dim and Block dim

easybob · July 19, 2019, 12:32pm

Hello,

i am a great newbie with Cuda and i will start soon my first tests using pycuda with a Jetson Nano.

I need your help to clarify an important point : how calculating Thread index

My tests will be very simple. Mainly, i will make calculus on monochrome image (that is to say a matrix), probably 1 dimension matrix (maybe 2 dimension matrix).

From what i have understand (maybe) :

Jetson Nano gets 128 Cuda cores with single GPU.

Maximum number of threads per multiprocessor : 2048

That is to say maximum threads number will be 2048.

Then, i got to choose :
Number of blocks (one or 2 dimensions)
Number of threads per block (warp size is 32 so i must choose 32, 64, 128 etc. threads per block)

As i will first work on one dimension array (image), i have to calculate idx.

If i choose 1 block and N threads per block :
unsigned int idx = threadIdx.x

If i choose M blocks one dimension (M,1) and N threads per block :
unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x

If i want to work with a 2 dimensions grid :
unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x
unsigned int idy = threadIdx.y + blockIdx.y * blockDim.y

Are those things correct or not ?

Many thx for your help

Alain

Robert_Crovella · July 19, 2019, 2:07pm

mostly correct.

Number of threads per block (warp size is 32 so i should choose 32, 64, 128 etc. threads per block)

Yes

You can choose 2048 but you can also choose a number higher than 2048, and it will work fine.

easybob · July 19, 2019, 2:19pm

Hello Robert,

many thanks for your reply.

Cuda is really interesting. I used to think “CPU” and “GPU” thinking is quite different but it will brings me very new opportunities.

I think i will be back very often to ask newbie questions !

Have a nice day.

Alain

easybob · July 19, 2019, 7:49pm

I’m back !

My first pycuda program works great. Champagne !

Robert_Crovella · July 19, 2019, 8:02pm

Profit!

jetsonnvidia · March 3, 2020, 12:37pm

If we want to run the Nano as efficiently as possible, should we aim to run 2048 threads exactly (i.e. 2 blocks of 1024 threads each)?

ryork · March 3, 2020, 11:00pm

Run timing tests to see how the performance varies. They should provide you with an answer. Preferably the test should be with your program or one similar to it because the answer will likely depend on the nature of your problem and what its limiting factors are.

jetsonnvidia · March 5, 2020, 5:36pm

Okay, I’ll do some tests.

I assumed 1024 was best since that was what the official sample code used, however the specs say 2048 threads is possible.

Topic		Replies	Views
Howto maximize blocks and grid for Jetson Xavier Jetson AGX Xavier opengl	5	391	December 29, 2023
Max no. of threads in a multiprocessor. CUDA Programming and Performance	4	1693	September 29, 2009
Help with block size and block numbers CUDA Programming and Performance	3	1776	November 26, 2009
Calculating the optimal grid and block size? CUDA Programming and Performance	1	6113	August 30, 2011
3D Grid and 3D Block Thread Indexing for 2 Nested For Loops CUDA Programming and Performance	2	1027	December 31, 2019
Optimization problem how many blocks/ threads... CUDA Programming and Performance	1	1896	July 9, 2010
How many CUDA multiprocessors does the Jetson Nano have? Jetson Nano cuda	3	522	September 22, 2023
Max number of thhreads per block and max number of blocks Jetson Xavier NX cuda , kernel	4	2135	September 11, 2023
Maximum grid size = 2147483647 ? CUDA Programming and Performance	1	13949	October 17, 2014
Questions about Block and Grid CUDA Programming and Performance	4	3542	February 26, 2008

Calculate Thread index with Grid dim and Block dim

Related topics