number of blocks and threads

I am wondering what’s the difference in the following allocation manner:

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)

suppose the basic unit is thread, and I want to start 512 threads.

It can be allocated like

1 block x 512 thread
2 block x 256 thread
4 block x 128 thread

Does anyone know what’s the difference between them ?
I am confused …

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)
In effect 256 threads will be created and started in both the cases.
The difference is that in the first case all the 256 threads will get executed in the same multi processor, but in the second case 2 mutiprocessors will get used with 128 threads in each multiprocessor.

kernel<<<1,256>>>(…)
kernel<<<2,128>>>(…)
In effect 256 threads will be created and started in both the cases.
The difference is that in the first case all the 256 threads will get executed in the same multi processor, but in the second case 2 mutiprocessors will get used with 128 threads in each multiprocessor.

So that means blocks are allocated to GPU processors.

if I have 5 blocks, then 5 processors would be started.

thanks for your reply~

So that means blocks are allocated to GPU processors.

if I have 5 blocks, then 5 processors would be started.

thanks for your reply~

I hope by GPU processor you understand the multiprocessor within a single GPU, not separate GPUs.
GPU -> Multiprocessors -> Cuda cores. This is the structure.
If you are giving kernel<<<1,256>>>(…) then you are not using all the GPU cores for processing the work. All the multiprocessors except one will be in idle state… :(

I hope by GPU processor you understand the multiprocessor within a single GPU, not separate GPUs.
GPU -> Multiprocessors -> Cuda cores. This is the structure.
If you are giving kernel<<<1,256>>>(…) then you are not using all the GPU cores for processing the work. All the multiprocessors except one will be in idle state… :(

Ok, I got it.

Thanks for your reply ~

Ok, I got it.

Thanks for your reply ~