In my program I plan to call in some threads return earlier than in others.
But what happens with that “returned” threads. If all threads in half-warp called return will this half-warp will be called to do some “empty work”? (because CUDA is Single Instrucion Multi Data architecture). And what about situation when in one of blocks all threads will call return? Will this block be call to do some “empty work”.
I’m asking, because in my program I have two possibilities: start less threads and manage work between them (hard to code) or start more threads and call return when they do their job (easier to code, but what with performance?)
Threads are removed from the scheduling pool on the granularity of warps, and resources (registers, shared memory, etc) are reclaimed on the granularity of blocks. Calling return effectively disables a thread, and once all threads in the warp are disabled, the warp is removed from the scheduling pool and will stop executing any instructions. However, it will hold onto registers and shared memory until all warps in the block are finished.
In the case where threads in the same warp call return, but there are other threads in that warp that are not disabled, the disabled threads will perform ‘empty work’. In the case where all threads in a warp call return, they will not do any extra work. This also applies to the case where all threads in a block call return. Once this happens, the resources used by that block will be reclaimed and another block can use them.
Thanks for reply, that helps me a lot :-)