Simple Question, please answer!

If I’m working with only one block (dimgrid(1,1,1)), all threads of this block will only use one multiprocessor. Is that right? A simple yes or no is enough!

I can’t understand it in the programming guide.


Yes I believe that is right.

(Also means that the other multiprocessors can be busy processing other kernels and updating the display.)

A block can’t be split across multiprocessors because each block can only access the 16k shared cache of one multiprocessor.


Yes, that’s plausible. Thank you very much for your answer!

I thought the gpu can only one kernel execute at any time and you need a second graphics card for the display…

Unfortunately not the case. Only a single kernel or the display driver can access the GPU at a time. There is no hardware level timesharing or anything like that. The driver queues everything and can overlap certain operations (certain classes of memory copies with executing kernels for example).

The first part is correct, but the second part is not. You can run CUDA kernels on a GPU which is running an active display. In such cases, the driver maintains a watchdog timer and will kill any kernels which take too long (usually in the 5-10 second range depending on OS).