How do multiple CUDA programs run?


I submited two CUDA programs on by one onto the GPU with 2 SMs. I specified the first CUDA program with 1 block and expect it run on 1 SM, and then submitted another CUDA program onto the CUDA with hundreds of blocks.

How do these multiple CUDA programs run on the GPU?

On my platform, it seems that the first CUDA Program run first, but the second program wait until the first program finished and start to run. But that is not my expectation, I would like them could run simultaneously.

In general they just time slice. The kernels from different programs will run sequentially. This means the program with 1 block keeps the gpu occupied until the kernel is done, then switches to the other proram. If you have compute capability 3.5 there is something called HyperQ, which in theory means that kernels from different CUDA programs run concurrently. In practice I am not sure if this really works.