I was getting great mileage out of the MPS feature in recent CUDA versions on a machine featuring a V100, then another featuring a couple of RTX cards. However, when I try to replicate the success on other boxes, I find that it is impossible to start any CUDA jobs at all with MPS running.
The script I use to start MPS is simple:
And the script I could stop it with is:
echo quit | nvidia-cuda-mps-control
When I engage the MPS, then try to run a job, I see nivida-cuda-mps working very hard to take up one of the CPUs, then I get the error message “cudaGetDeviceCount failed unknown error” printed to the screen for each time I try to run a CUDA program. This is not the first such box to give me this problem, but I am not certain where it is coming from or why I’ve had such good results elsewhere. Can anyone point out something I am not doing right?