How to use "block" and "thread"

ryuta · October 14, 2013, 3:54pm

Hello
I’m japanese student.

I have question about “thread” and “block”.

How to use which gets the best performance of “thread” and “block”?

GeForce GTX650

Sorry.
I’m not so good at English.

nadeemm · October 14, 2013, 4:21pm

There is no simple one rule for best Block size - it depends on your CUDA Kernals, their register usage, shared memory usage and also how many threads needed for best occupancy.

I would suggest you review some of the introduction to CUDA recorded webinars which you can find on GTC 2022: #1 AI Conference and also the CUDACasts on Youtube

There is also a good explanation in the CUDA programming guide on www.docs.nvidia.com
Finally once you understand the factors - we have a CUDA Occupancy Calculator, which is referenced in the manual which will help to see the impact of some the elements for your situation.

Good Luck

pasoleatis · October 14, 2013, 9:24pm

I recommend you to read CUDA by example book. It is a little old, but it will give good idea how to write cuda programs. The CUDA Programming Guide is also a very good document and not too hard to read.

blade613x · October 15, 2013, 9:40pm

I’m a bit confused as well, but does it matter much for calculations with millions of elements? If you’re using 128,256,512,1024 threads per block, but as long as you are utilizing the maximum 1024 threads/SM, that is what is important?

pasoleatis · October 16, 2013, 8:22am

A SM can ran more than one block and it can 1536 max threads active for cc 2.0 and 2048 for cc 3.5. In the case of a Fermi card (cc 2.0) even if you have 768 threads per block you could have 2 blocks executing a total of 1536 threads per SM, while if you have 1024 you will have only 1024 threads active on 1 SM and a lower practical occupancy.

Markus_Wagner · October 16, 2013, 9:16am

The optimal number of threads per block very much depends on the kernel you are using. In the kernels that I am developing, using a large number of threads often is not optimal because I have a large register count or shared memory usage. In my case, using 64 or 128 threads per block is often a good value for Fermi GPUs. I don’t have any Kepler GPUs at my disposal so this might change with different hardware.

Topic		Replies	Views
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27699	February 15, 2010
How to determine the Block Size CUDA Programming and Performance	1	5902	September 4, 2009
maximum threads per block not always used CUDA Programming and Performance	2	754	June 14, 2018
The choose of grid size and block size CUDA Programming and Performance	8	2809	May 8, 2024
Optimal threads vs blocks CUDA Programming and Performance	4	4023	April 24, 2011
Grids and Threads question CUDA Programming and Performance	2	4421	August 7, 2007
Number of Threads vs Number of Blocks in GPU Kernel CUDA Programming and Performance	4	8560	July 16, 2017
Questions about Block and Grid CUDA Programming and Performance	4	3548	February 26, 2008
Blocks and Threads CUDA Programming and Performance	1	642	February 7, 2013
Lots of Threads vs. Shared Memory CUDA Programming and Performance	9	8350	February 12, 2008

How to use "block" and "thread"

Related topics