all CUDA-capable devices are busy or unavailable problem in a multi-process Linux application

I’m building a server/client program that allocates GPUs to remote clients, using the runtime API. The parent server process accepts network connections from clients and creates a child process to service the request. The child calls cudaSetDevice() for the appropriate device and runs the CUDA kernels.

All devices are in compute mode 0 (non-exclusive mode).

Problem is, in the children, cudaMalloc() is returning the error “all CUDA-capable devices are busy or unavailable”. My guess is that the parent/server process is creating a CUDA context when it starts up, well before it forks to create a child. The child is then unable to access the device because the CUDA context exists in the parent.

Do I need to do something like calling cudaThreadExit() in the parent before I fork() to create any children?

Thanks for any help.

I am using a GTX480 cards on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

I’m building a server/client program that allocates GPUs to remote clients, using the runtime API. The parent server process accepts network connections from clients and creates a child process to service the request. The child calls cudaSetDevice() for the appropriate device and runs the CUDA kernels.

All devices are in compute mode 0 (non-exclusive mode).

Problem is, in the children, cudaMalloc() is returning the error “all CUDA-capable devices are busy or unavailable”. My guess is that the parent/server process is creating a CUDA context when it starts up, well before it forks to create a child. The child is then unable to access the device because the CUDA context exists in the parent.

Do I need to do something like calling cudaThreadExit() in the parent before I fork() to create any children?

Thanks for any help.

I am using a GTX480 cards on

x86_64 Red Hat Enterprise Linux Client release 5.4 (Tikanga)

Nvidia driver version 256.40

The Cuda toolkit I downloaded was cudatoolkit_3.1_linux_64_rhel5.4.run

You should probably call cudaThreadExit(), yes.

You should probably call cudaThreadExit(), yes.

I added cudaThreadExit() to the parent, and the children are still returning the error “all CUDA-capable devices are busy or unavailable”. Can you think of anything else I could try?

Thanks

I added cudaThreadExit() to the parent, and the children are still returning the error “all CUDA-capable devices are busy or unavailable”. Can you think of anything else I could try?

Thanks

Hm, can you try with an r260 driver? At one point in time fork() didn’t really work with the CUDA driver, and I don’t remember in what version I fixed it.

Hm, can you try with an r260 driver? At one point in time fork() didn’t really work with the CUDA driver, and I don’t remember in what version I fixed it.

Thanks, I’ll try the new 260.19.14 driver. Just one dumb question – the driver install script devdriver_3.2_linux_64_260.19.14.run has a “3.2” in the name. Does that mean that the CUDA Toolkit v. 3.2 is required? If so, I will have to upgrade my O/S. RHEL 5.4 isn’t supported by that version of the Toolkit, according to its Release Notes.

Thanks, I’ll try the new 260.19.14 driver. Just one dumb question – the driver install script devdriver_3.2_linux_64_260.19.14.run has a “3.2” in the name. Does that mean that the CUDA Toolkit v. 3.2 is required? If so, I will have to upgrade my O/S. RHEL 5.4 isn’t supported by that version of the Toolkit, according to its Release Notes.

No you don’t.
New driver will work with old toolkit.

Also, the release note is incorrect, 3.2 will work just fine on RHEL 5.x

No you don’t.
New driver will work with old toolkit.

Also, the release note is incorrect, 3.2 will work just fine on RHEL 5.x

I installed the new driver (260.19.14, steps taken are below), and I’m still getting the “busy or unavailable” error message after the fork(). Is there anything else I could try?

  1. ctrl-alt-F1, then log into account

  2. /sbin/init 3

  3. stop nvidia-smi, which is running in daemon mode

  4. remove the Nvidia driver: rmmod nvida

  5. uninstall the old Nvidia driver: nvidia-installer –uninstall

  6. install the new driver – go to where it is downloaded and type: sh devdriver_3.2_linux_64_260.19.14.run

  7. reload the new module: modprobe nvidia

  8. reboot

  9. startx

I installed the new driver (260.19.14, steps taken are below), and I’m still getting the “busy or unavailable” error message after the fork(). Is there anything else I could try?

  1. ctrl-alt-F1, then log into account

  2. /sbin/init 3

  3. stop nvidia-smi, which is running in daemon mode

  4. remove the Nvidia driver: rmmod nvida

  5. uninstall the old Nvidia driver: nvidia-installer –uninstall

  6. install the new driver – go to where it is downloaded and type: sh devdriver_3.2_linux_64_260.19.14.run

  7. reload the new module: modprobe nvidia

  8. reboot

  9. startx

Can you post some code that repros the problem?

Can you post some code that repros the problem?

I created a test program that just forks to create a child, and then the child runs a CUDA kernel. The kernel ran fine, no “busy or unavailable” error. So the Nvidia driver handles fork() correctly; it must be something I’m doing in my real program, but that is too big to post.

Any further guesses on your part would be welcome. Would the use of Linux shared memory (shm_open(), librt.a, etc.) cause any sort of problem?

I created a test program that just forks to create a child, and then the child runs a CUDA kernel. The kernel ran fine, no “busy or unavailable” error. So the Nvidia driver handles fork() correctly; it must be something I’m doing in my real program, but that is too big to post.

Any further guesses on your part would be welcome. Would the use of Linux shared memory (shm_open(), librt.a, etc.) cause any sort of problem?

OK, I found the cause but I don’t understand why it is a problem. The parent calls cudaGetDeviceCount(), cudaGetDeviceProperties() and uses “new” to create several cudaDeviceProp structures before the fork. In my little test program, adding these calls to the parent causes the “busy or unavailable” error in the children. Can you explain?

cudaError_t err;

    int n_devices_total_;

    struct cudaDeviceProp *device_props_[N_DEVICES_MAX_];

// get device count

    err = cudaGetDeviceCount(&n_devices_total_);

    if (err != cudaSuccess)

    {

        // throw exception

} else if (n_devices_total_ == 0)

    {

        // throw exception

} else if (n_devices_total_ > N_DEVICES_MAX_)

    {

        // throw exception

    }

// store the device properties

    for (int i = 0; i < n_devices_total_; i++)

    {

        device_props_[i] = new cudaDeviceProp();

        err = cudaGetDeviceProperties(device_props_[i], i);

        if (err != cudaSuccess)

        {

            // throw exception

        }

    }

OK, I found the cause but I don’t understand why it is a problem. The parent calls cudaGetDeviceCount(), cudaGetDeviceProperties() and uses “new” to create several cudaDeviceProp structures before the fork. In my little test program, adding these calls to the parent causes the “busy or unavailable” error in the children. Can you explain?

cudaError_t err;

    int n_devices_total_;

    struct cudaDeviceProp *device_props_[N_DEVICES_MAX_];

// get device count

    err = cudaGetDeviceCount(&n_devices_total_);

    if (err != cudaSuccess)

    {

        // throw exception

} else if (n_devices_total_ == 0)

    {

        // throw exception

} else if (n_devices_total_ > N_DEVICES_MAX_)

    {

        // throw exception

    }

// store the device properties

    for (int i = 0; i < n_devices_total_; i++)

    {

        device_props_[i] = new cudaDeviceProp();

        err = cudaGetDeviceProperties(device_props_[i], i);

        if (err != cudaSuccess)

        {

            // throw exception

        }

    }