Is fewer threads better?

sjf · May 23, 2010, 11:33pm

Hi,
I am working on an image processing algorithm, and to make my life easier I have implemented the kernels with 1 thread/pixel, but now I am wondering will I ever been able to max out the number of threads? I’ve calculated that the maximum no. of threads I can create is 6553565535512=2198956147200 (that is a grid of 65535x65535, and block size of 512). It seems like this is more than enough to accommodate my images, which are in the ~10 megapixel range. But is there an overhead to creating such a large amount of threads? I could tile my images, and use 1 thread / NxN pixels, but it would be a lot of work to change the code, without a justification that it would improve performance.
Any advice is welcome.

kbam · May 24, 2010, 12:03am

I think the answer is generally fewer threads is worse !

having fewer threads would also make it harder to have contiguous reads and writes to/from global arrays and you definately want those for performance.
It is also desirable to have 1 or more complete warps per block i.e. 32,64,96,… threads per block

NB if your image is say 28003500 and you decide on 1616 threads per block, that is a grid of 175*219 (218.75 rounded up) and 175 blocks will be slightly underutilised but that doesnt matter.

Sorry for saying some stuff you already knew.
kbam

kbam · May 24, 2010, 12:03am

I think the answer is generally fewer threads is worse !

having fewer threads would also make it harder to have contiguous reads and writes to/from global arrays and you definately want those for performance.
It is also desirable to have 1 or more complete warps per block i.e. 32,64,96,… threads per block

NB if your image is say 28003500 and you decide on 1616 threads per block, that is a grid of 175*219 (218.75 rounded up) and 175 blocks will be slightly underutilised but that doesnt matter.

Sorry for saying some stuff you already knew.
kbam

kbam · May 24, 2010, 12:05am

Sorry internet glitched and my reply was sent twice

kbam · May 24, 2010, 12:05am

Sorry internet glitched and my reply was sent twice

seibert · May 24, 2010, 12:42am

In CUDA, active threads have no switching overhead (unlike threads on a CPU) so, in general, more is better because they can hide the latency of global memory. I do recall seeing some posts that measured a small launch overhead proportional to the number of blocks, but I can’t locate them now.

Note that the number of threads can have other subtle timing effects. You should, when possible, design your kernels to work with any block size and benchmark all reasonable multiples of 32.

seibert · May 24, 2010, 12:42am

In CUDA, active threads have no switching overhead (unlike threads on a CPU) so, in general, more is better because they can hide the latency of global memory. I do recall seeing some posts that measured a small launch overhead proportional to the number of blocks, but I can’t locate them now.

Note that the number of threads can have other subtle timing effects. You should, when possible, design your kernels to work with any block size and benchmark all reasonable multiples of 32.

Topic		Replies	Views
blocks vs threads and bad CUDA performance CUDA Programming and Performance	3	3626	January 23, 2015
Performance in different thread-block schemes CUDA Programming and Performance	5	2440	September 19, 2008
efficiency of block/thread ratios CUDA Programming and Performance	2	3882	April 18, 2007
Overhead of launching thread blocks CUDA Programming and Performance	3	4625	September 2, 2008
Two questions about too many threads in a block CUDA Programming and Performance	5	2393	October 26, 2011
Why does it help to use more thread blocks? CUDA Programming and Performance	4	4361	December 6, 2010
Can anyone explain this ... ? CUDA Programming and Performance	2	2498	July 27, 2008
Run a million threads or blocks on a single kernel function, and still works. It supposed to be 512 at maximum, isn't it? CUDA Programming and Performance	4	1407	January 6, 2017
Newbie: More threads == much slower? :( CUDA Programming and Performance	4	2173	July 25, 2008
Thread Block Size what difference does it make? CUDA Programming and Performance	6	5575	June 3, 2008

Is fewer threads better?

Related topics