since coputation done on the device(graphic card) and graphic card also used for display
then what we have to do in order the display wont jam while calculation ?
I think the GPU itself can deal with both the display and cuda computation well.
But it should be noticed that if your cuda program uses lots of memory, heavy display will make the cuda program faillure because display is prior to the cuda.
No. Only one kernel can execute at a time. On later cards you can do memory copies and kernel execution simultaneously.
One trick you can do to get around this is to split a kernel with an if/else so that if the thread index is greater than a certain value a completely different execution path is taken. You have to be careful not to get divergences in this switch and register usage can be a problem if one path is a lot more than the other (the first path may be unnecessarily slowed) but it’s doable.
If you accidentally write out of bounds bad things may happen to the display. If you change the memory the display requires when running a CUDA program (by doing a mode switch) you can cause problems with the CUDA. You might see slowdowns if you’re using funky GPU accelerated desktop effects.
You can’t. All the threads in one block are guaranteed to run on the same multiprocessor however.
The access restrictions are defined in the Programming Guide.
Unrolling loops allows you to reduce overhead caused by iterating through loops. Instead of iterating at runtime it does it at compile time. This can cause significant time improvements (though can make no difference) and can save registers. I’d google “loop unrolling” - there are plenty of results that explain the process.