Problem when launching many CUDA processes in the same time


I try to launch CUDA processes concurrently (I know they cannot run concurrently).

What I observe is that in the case of many processes (for example 32 concurrent processes) the 25 CUDA processes will run correctly (they return the correct result) but the 7 CUDA processes will end with very small latency and will return wrong result.

I do not expect to run concurrently or quickly the 32 concurrent different CUDA processes, but why the 7 out of 32 processes do not run correctly? And whenever I launch concurrently the 32 processes, always 7 processes do not run correctly.

Could you please explain me why?
Thanks in advance!

  1. You may be running out of memory. The GPU concurrency model when in DEFAULT compute mode is to allow all processes to run, but kernel activity must be serialized/time-sliced/context-switched. However all processes may attempt to allocate GPU memory, and if any such allocations fail, the process will likely produce incorrect results.

  2. There is a limit to the number of concurrent processes that can be run on a GPU, which is basically limited by the number of concurrent contexts. AFAIK this limit is unpublished.

If any of this is happening, and you are unaware of it, it means you are not doing proper CUDA error checking. I always encourage people to do proper CUDA error checking, especially when having trouble with a CUDA code, preferably before asking others for help.