I’m building a server/client program that allocates GPUs to remote clients, using the runtime API. The parent server process accepts network connections from clients and creates a child process to service the request. The child calls cudaSetDevice() for the appropriate device and runs the CUDA kernels.
All devices are in compute mode 0 (non-exclusive mode).
Problem is, in the children, cudaMalloc() is returning the error “all CUDA-capable devices are busy or unavailable”. My guess is that the parent/server process is creating a CUDA context when it starts up, well before it forks to create a child. The child is then unable to access the device because the CUDA context exists in the parent.
Do I need to do something like calling cudaThreadExit() in the parent before I fork() to create any children?
Thanks for any help.
I am using a GTX480 cards on
x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)
Nvidia driver version 256.40
The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run