I am running a parallel application on my Quadro K600.
The nVidia data sheet says that the card has 192 cuda cores.
Although I get
launch timed out and was terminated as soon as I ask for more than <<<1,60>>> threads, 61 thread crash the program.
I understand from a previous question on the forum that the card needs to manage display as well, but does it need 192-60 cores for display?
Besides, does anybody know in how many blocks these threads are distributed?
Currently shipping GPUs do not allow setting aside cores to service the GUI while executing compute kernels. When a compute kernel launches, it grabs all the cores. As a result, GUI updates are blocked, which causes the watchdog timer to kick in if this lasts for more than about two seconds.
In a CUDA kernel launch:
The first number is the number of blocks requested. The second number is the number of threads in each block. So this launch configuration is requesting one block of 60 threads.
Ok, i get it. I thought that the number of blocks needed to match the physical SMs.
Quite to the contrary. For good performance you would want to target around twenty blocks per SM as a minimum.