all CUDA-capable devices are busy or unavailable problem in a multi-process Linux application

kmccall · November 1, 2010, 9:31pm

I’m building a server/client program that allocates GPUs to remote clients, using the runtime API. The parent server process accepts network connections from clients and creates a child process to service the request. The child calls cudaSetDevice() for the appropriate device and runs the CUDA kernels.

All devices are in compute mode 0 (non-exclusive mode).

Problem is, in the children, cudaMalloc() is returning the error “all CUDA-capable devices are busy or unavailable”. My guess is that the parent/server process is creating a CUDA context when it starts up, well before it forks to create a child. The child is then unable to access the device because the CUDA context exists in the parent.

Do I need to do something like calling cudaThreadExit() in the parent before I fork() to create any children?

Thanks for any help.

I am using a GTX480 cards on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

kmccall · November 1, 2010, 9:31pm

I’m building a server/client program that allocates GPUs to remote clients, using the runtime API. The parent server process accepts network connections from clients and creates a child process to service the request. The child calls cudaSetDevice() for the appropriate device and runs the CUDA kernels.

All devices are in compute mode 0 (non-exclusive mode).

Problem is, in the children, cudaMalloc() is returning the error “all CUDA-capable devices are busy or unavailable”. My guess is that the parent/server process is creating a CUDA context when it starts up, well before it forks to create a child. The child is then unable to access the device because the CUDA context exists in the parent.

Do I need to do something like calling cudaThreadExit() in the parent before I fork() to create any children?

Thanks for any help.

I am using a GTX480 cards on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

tmurray · November 1, 2010, 9:48pm

You should probably call cudaThreadExit(), yes.

tmurray · November 1, 2010, 9:48pm

You should probably call cudaThreadExit(), yes.

kmccall · November 2, 2010, 4:20pm

I added cudaThreadExit() to the parent, and the children are still returning the error “all CUDA-capable devices are busy or unavailable”. Can you think of anything else I could try?

Thanks

kmccall · November 2, 2010, 4:20pm

I added cudaThreadExit() to the parent, and the children are still returning the error “all CUDA-capable devices are busy or unavailable”. Can you think of anything else I could try?

Thanks

tmurray · November 2, 2010, 8:14pm

Hm, can you try with an r260 driver? At one point in time fork() didn’t really work with the CUDA driver, and I don’t remember in what version I fixed it.

tmurray · November 2, 2010, 8:14pm

Hm, can you try with an r260 driver? At one point in time fork() didn’t really work with the CUDA driver, and I don’t remember in what version I fixed it.

kmccall · November 3, 2010, 9:10pm

Thanks, I’ll try the new 260.19.14 driver. Just one dumb question – the driver install script devdriver_3.2_linux_64_260.19.14.run has a “3.2” in the name. Does that mean that the CUDA Toolkit v. 3.2 is required? If so, I will have to upgrade my O/S. RHEL 5.4 isn’t supported by that version of the Toolkit, according to its Release Notes.

kmccall · November 3, 2010, 9:10pm

Thanks, I’ll try the new 260.19.14 driver. Just one dumb question – the driver install script devdriver_3.2_linux_64_260.19.14.run has a “3.2” in the name. Does that mean that the CUDA Toolkit v. 3.2 is required? If so, I will have to upgrade my O/S. RHEL 5.4 isn’t supported by that version of the Toolkit, according to its Release Notes.

mfatica · November 3, 2010, 9:21pm

No you don’t.
New driver will work with old toolkit.

Also, the release note is incorrect, 3.2 will work just fine on RHEL 5.x

mfatica · November 3, 2010, 9:21pm

No you don’t.
New driver will work with old toolkit.

Also, the release note is incorrect, 3.2 will work just fine on RHEL 5.x

kmccall · November 4, 2010, 4:56pm

I installed the new driver (260.19.14, steps taken are below), and I’m still getting the “busy or unavailable” error message after the fork(). Is there anything else I could try?

ctrl-alt-F1, then log into account
/sbin/init 3
stop nvidia-smi, which is running in daemon mode
remove the Nvidia driver: rmmod nvida
uninstall the old Nvidia driver: nvidia-installer â€“uninstall
install the new driver â€“ go to where it is downloaded and type: sh devdriver_3.2_linux_64_260.19.14.run
reload the new module: modprobe nvidia
reboot
startx

kmccall · November 4, 2010, 4:56pm

I installed the new driver (260.19.14, steps taken are below), and I’m still getting the “busy or unavailable” error message after the fork(). Is there anything else I could try?

ctrl-alt-F1, then log into account
/sbin/init 3
stop nvidia-smi, which is running in daemon mode
remove the Nvidia driver: rmmod nvida
uninstall the old Nvidia driver: nvidia-installer â€“uninstall
install the new driver â€“ go to where it is downloaded and type: sh devdriver_3.2_linux_64_260.19.14.run
reload the new module: modprobe nvidia
reboot
startx

tmurray · November 4, 2010, 7:54pm

Can you post some code that repros the problem?

tmurray · November 4, 2010, 7:54pm

Can you post some code that repros the problem?

kmccall · November 4, 2010, 9:00pm

I created a test program that just forks to create a child, and then the child runs a CUDA kernel. The kernel ran fine, no “busy or unavailable” error. So the Nvidia driver handles fork() correctly; it must be something I’m doing in my real program, but that is too big to post.

Any further guesses on your part would be welcome. Would the use of Linux shared memory (shm_open(), librt.a, etc.) cause any sort of problem?

kmccall · November 4, 2010, 9:00pm

I created a test program that just forks to create a child, and then the child runs a CUDA kernel. The kernel ran fine, no “busy or unavailable” error. So the Nvidia driver handles fork() correctly; it must be something I’m doing in my real program, but that is too big to post.

Any further guesses on your part would be welcome. Would the use of Linux shared memory (shm_open(), librt.a, etc.) cause any sort of problem?

kmccall · November 5, 2010, 6:18pm

OK, I found the cause but I don’t understand why it is a problem. The parent calls cudaGetDeviceCount(), cudaGetDeviceProperties() and uses “new” to create several cudaDeviceProp structures before the fork. In my little test program, adding these calls to the parent causes the “busy or unavailable” error in the children. Can you explain?

cudaError_t err;

    int n_devices_total_;

    struct cudaDeviceProp *device_props_[N_DEVICES_MAX_];

// get device count

    err = cudaGetDeviceCount(&n_devices_total_);

    if (err != cudaSuccess)

    {

        // throw exception

} else if (n_devices_total_ == 0)

    {

        // throw exception

} else if (n_devices_total_ > N_DEVICES_MAX_)

    {

        // throw exception

    }

// store the device properties

    for (int i = 0; i < n_devices_total_; i++)

    {

        device_props_[i] = new cudaDeviceProp();

        err = cudaGetDeviceProperties(device_props_[i], i);

        if (err != cudaSuccess)

        {

            // throw exception

        }

    }

kmccall · November 5, 2010, 6:18pm

OK, I found the cause but I don’t understand why it is a problem. The parent calls cudaGetDeviceCount(), cudaGetDeviceProperties() and uses “new” to create several cudaDeviceProp structures before the fork. In my little test program, adding these calls to the parent causes the “busy or unavailable” error in the children. Can you explain?

cudaError_t err;

    int n_devices_total_;

    struct cudaDeviceProp *device_props_[N_DEVICES_MAX_];

// get device count

    err = cudaGetDeviceCount(&n_devices_total_);

    if (err != cudaSuccess)

    {

        // throw exception

} else if (n_devices_total_ == 0)

    {

        // throw exception

} else if (n_devices_total_ > N_DEVICES_MAX_)

    {

        // throw exception

    }

// store the device properties

    for (int i = 0; i < n_devices_total_; i++)

    {

        device_props_[i] = new cudaDeviceProp();

        err = cudaGetDeviceProperties(device_props_[i], i);

        if (err != cudaSuccess)

        {

            // throw exception

        }

    }

Topic		Replies	Views
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64982	April 20, 2011
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3638	March 10, 2011
CudaMalloc fails when more of 2 linux process acces to the GPU 0 CUDA Programming and Performance	2	1186	February 24, 2009
all cuda-capable devices are busy or unavailable CUDA Programming and Performance	0	2698	February 24, 2011
can not execute external cuda process CUDA Programming and Performance	7	14088	February 22, 2010
CUDA and fork() CUDA Programming and Performance	3	12764	December 22, 2007
CUDA runtime error: "all CUDA-capable devices are busy or unavailable" CUDA Programming and Performance	2	3555	October 13, 2017
All CUDA-capable devices busy or unavailable Jetson TX2 cuda	9	4134	December 28, 2021
Strange all cuda-capable devices are busy or unavailable error CUDA Programming and Performance	2	1085	November 28, 2012
All CUDA-capable devices busy or unavailable Jetson TX2 cuda , jetson-inference	3	971	January 5, 2022

all CUDA-capable devices are busy or unavailable problem in a multi-process Linux application

Related topics