I submited two CUDA programs on by one onto the GPU with 2 SMs. I specified the first CUDA program with 1 block and expect it run on 1 SM, and then submitted another CUDA program onto the CUDA with hundreds of blocks.
How do these multiple CUDA programs run on the GPU?
On my platform, it seems that the first CUDA Program run first, but the second program wait until the first program finished and start to run. But that is not my expectation, I would like them could run simultaneously.