I know that the maximum blocks that can run concurrently on a multiprocessor is 8.
So for a 8800 GTX, we can run concurrently 64 blocks.
I just want to know the limit of blocks we can create (I suppose there is a limit).
And I have a question.
If we have two cases (on a 8800 GTX) :
1 - we run an application on 128 blocks with 1 thread per block
2 - we run this same application on 4 blocks with 32 threads
In my mind, the second case will run more rapidly because it’s using warps. But I am not sur and I need an explanation. Maybe it depends on something else …
Thanks for your futur answers.