Max Concurrent Threads

Baboshka · April 30, 2024, 9:36pm

Hi All,

I am new to CUDA programing (using C++ since few months) and using it to solve some mathematical equations.
I do small atomic calculations in the kernel calls.

I have the RTX 3060TI GPU, now the question is what is the max concurrent threads that i can start on this GPU?

I have been looking in the internet and everyone is explaining blocks and grids but what are the real numbers for specific hardware !!

Is there a small code/program that shows those numbers ?
I found some tools that show cores =38x128 =4864
Some other show Grid=304x128

I will be very thankful for your answer

Regards

Robert_Crovella · April 30, 2024, 11:13pm

The max concurrent threads is the maximum threads per SM times the number of SMs in your GPU. The number of SMs in your GPU can be discovered perhaps by a google search (seems to be 38), or by running the deviceQuery sample code.

The maximum threads per SM I think is also reported in deviceQuery and can be gotten from a table in the programming guide. (Your 3060Ti device is a cc8.6 GPU, which can also be discovered from deviceQuery)

38x1536 = 58,368 max concurrent threads

So a good “minimum” kernel launch config to aim for might be 114 blocks of 512 threads each.

Baboshka · May 1, 2024, 4:30pm

Thanks @Robert_Crovella for your quick answer.
As said i am watching some videos and reading as much as i can to understand more about the GPU programing.

Honestly i read this 1536 somewhere but thought 38x1536 = 58,368 is a silly number (i am used to 2,4,8…256. 512…) therefore i thought that 1536 could be wrong.

Now i have printed the cudaDeviceProp:
GPUEngine Prop:

Property	Value
multiProcessorCount	38
maxBlocksPerMultiProcessor	16
maxThreadsPerBlock	1024
maxThreadsPerMultiProcessor	1536

Means i can go
Max Blocks of 38x16 = 608
Max Threads 38x1536 = 58368 (as you mentioned)

and i can make some combinations like:

456x128 (i like this)
228 x 256
114 x 512 (as you mentioned)

Dose the combinations make a difference ?

Best Regards

Robert_Crovella · May 1, 2024, 4:38pm

It can’t be answered independent of your actual kernel code. For many kernel code designs, such variation will make little difference in performance, in my experience. However it is possible based on a specific kernel design that the number of threads per block is an important choice, perhaps even for correctness.

Topic		Replies	Views
Maximum number of threads in a GPU CUDA Programming and Performance cuda	4	7556	December 29, 2022
Scheduling Thread Blocks CUDA Programming and Performance	5	1406	July 29, 2021
Maximum block per grid CUDA Programming and Performance cuda	4	4641	March 24, 2023
confusion of basic concepts CUDA Programming and Performance	8	6455	May 18, 2011
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	21812	December 13, 2018
threads how many threads can simultaneously execute? CUDA Programming and Performance	1	2038	February 27, 2009
kernel cannot utilize the full hardware resource? CUDA Programming and Performance	8	4812	August 19, 2009
How many concurrent threads are running on my GeForce GTX 1080 Ti? CUDA Programming and Performance	4	26910	January 3, 2018
How to know the maximum blocks I can launch CUDA Programming and Performance jetson	10	1542	November 9, 2024
Maximum thread number running at the same time for GT650M CUDA Programming and Performance	1	6487	December 13, 2012

Max Concurrent Threads

Related topics