number of blocks and threads

L.Allen · August 18, 2010, 6:58am

I am wondering what’s the difference in the following allocation manner:

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)

suppose the basic unit is thread, and I want to start 512 threads.

It can be allocated like

1 block x 512 thread
2 block x 256 thread
4 block x 128 thread

Does anyone know what’s the difference between them ?
I am confused …

Preetha · August 18, 2010, 7:57am

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)
In effect 256 threads will be created and started in both the cases.
The difference is that in the first case all the 256 threads will get executed in the same multi processor, but in the second case 2 mutiprocessors will get used with 128 threads in each multiprocessor.

Preetha · August 18, 2010, 7:57am

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)
In effect 256 threads will be created and started in both the cases.
The difference is that in the first case all the 256 threads will get executed in the same multi processor, but in the second case 2 mutiprocessors will get used with 128 threads in each multiprocessor.

L.Allen · August 18, 2010, 8:02am

So that means blocks are allocated to GPU processors.

if I have 5 blocks, then 5 processors would be started.

thanks for your reply~

L.Allen · August 18, 2010, 8:02am

So that means blocks are allocated to GPU processors.

if I have 5 blocks, then 5 processors would be started.

thanks for your reply~

Preetha · August 18, 2010, 8:16am

I hope by GPU processor you understand the multiprocessor within a single GPU, not separate GPUs.
GPU → Multiprocessors → Cuda cores. This is the structure.
If you are giving kernel<<<1,256>>>(…) then you are not using all the GPU cores for processing the work. All the multiprocessors except one will be in idle stateâ€¦ :(

Preetha · August 18, 2010, 8:16am

I hope by GPU processor you understand the multiprocessor within a single GPU, not separate GPUs.
GPU → Multiprocessors → Cuda cores. This is the structure.
If you are giving kernel<<<1,256>>>(…) then you are not using all the GPU cores for processing the work. All the multiprocessors except one will be in idle stateâ€¦ :(

L.Allen · August 18, 2010, 9:10am

Ok, I got it.

Thanks for your reply ~

L.Allen · August 18, 2010, 9:10am

Ok, I got it.

Thanks for your reply ~

Topic		Replies	Views
newbie, microprocessors CUDA Programming and Performance	7	4742	March 26, 2008
Simple Questions Hard-to-find answers CUDA Programming and Performance	2	7541	March 9, 2011
Single thread blocks or single block with more thread ... CUDA Programming and Performance	4	4034	May 21, 2013
2 blocks versus 3 blocks CUDA Programming and Performance	5	4944	August 3, 2009
thread vs block CUDA Programming and Performance	1	1381	July 9, 2009
CUDA processor allocation CUDA Programming and Performance	7	3462	October 5, 2007
threads per block / multi processor, contradiction ? CUDA Programming and Performance	5	1686	January 23, 2009
Using <<<...>>> CUDA Programming and Performance	6	2499	June 19, 2011
memory for CUDA threads memory utilzation is directly proportional to number of threads? CUDA Programming and Performance	2	2487	January 22, 2012
Scheduling Blocks on a Multi-Processor Block Scheduling on Multiprocessor CUDA Programming and Performance	11	6439	December 6, 2007

number of blocks and threads

Related topics