Hi,
I try to launch CUDA processes concurrently (I know they cannot run concurrently).
What I observe is that in the case of many processes (for example 32 concurrent processes) the 25 CUDA processes will run correctly (they return the correct result) but the 7 CUDA processes will end with very small latency and will return wrong result.
I do not expect to run concurrently or quickly the 32 concurrent different CUDA processes, but why the 7 out of 32 processes do not run correctly? And whenever I launch concurrently the 32 processes, always 7 processes do not run correctly.
Could you please explain me why?
Thanks in advance!
-
You may be running out of memory. The GPU concurrency model when in DEFAULT compute mode is to allow all processes to run, but kernel activity must be serialized/time-sliced/context-switched. However all processes may attempt to allocate GPU memory, and if any such allocations fail, the process will likely produce incorrect results.
-
There is a limit to the number of concurrent processes that can be run on a GPU, which is basically limited by the number of concurrent contexts. AFAIK this limit is unpublished.
[url]https://devtalk.nvidia.com/default/topic/1030080/is-there-a-maximum-number-of-contexts-per-gpu-encoded-into-the-driver-/[/url]
If any of this is happening, and you are unaware of it, it means you are not doing proper CUDA error checking. I always encourage people to do proper CUDA error checking, especially when having trouble with a CUDA code, preferably before asking others for help.