[CUDA8.0 BUG?] Child process forked after cuInit() get CUDA_ERROR_NOT_INITIALIZED on cuInit()

Hello, I could observe a degradation at CUDA7.5 -> 8.0.

Once a process does cuInit(), then, its child processes forked after the cuInit() gets CUDA_ERROR_NOT_INITIALIZED error on own cuInit(). It never happen on the previous CUDA7.5, but CUDA8.0 always makes this error.
Somebody other have seen the similar problems?

Below is the code to reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <cuda.h>

#define elog(FORMAT,...)                                \
    do {                                                \
        fprintf(stderr, FORMAT "\n", ##__VA_ARGS__);    \
        exit(1);                                        \
    } while(0)

static int child_proc(void)
{
    CUdevice    device;
    CUresult    rc;

    rc = cuInit(0);
    if (rc != CUDA_SUCCESS)
        elog("pid=%u failed on cuInit: %ld", getpid(), (long)rc);

    rc = cuDeviceGet(&device, 0);
    if (rc != CUDA_SUCCESS)
        elog("cuDeviceGet failed: %ld", (long)rc);

    return 0;
}

int main(int argc, char *argv[])
{
    CUresult    rc;
    pid_t       child;
    int         status;

    /* general initialization process */
    rc = cuInit(0);
    if (rc != CUDA_SUCCESS)
        elog("parent: failed on cuInit: %ld", (long)rc);

    /* connection accept, then fork a backend process */
    child = fork();
    if (child == 0)
        return child_proc();
    else if (child > 0)
        wait(&status);
    else
        elog("failed on fork(2): %m");

    return 0;
}

Execution example:

[kaigai@ayu ~]$ ./a.out
pid=10550 failed on cuInit: 3

It shows the cuInit() on the parent process get succeeded, but cuInit() on the child process gets failed.
It does not mean that child processes don’t need to call cuInit(), because the next cuDeviceGet() will fail even if I commented out the cuInit() on the child process side.

This kind of CUDA usage is very usual scenario on the server type software, and I could use the CUDA driver APIs at CUDA7.5. What is the reason of this mysterious behavior?

Software versions:
CUDA installation: 8.0.44 (Linux; runfile)
NVIDIA driver: 367.55

No, it’s not usual usage for CUDA.

If you’re going to fork a process, the CUDA advice for a long time was not to establish a CUDA context before the fork.

There are many references to this in a variety of materials.

For example, consider this comment in the CUDA simple IPC sample code:

// We can't initialize CUDA before fork() so we need to spawn a new process

This has never been proper CUDA behavior, and I wouldn’t try to explain your observations on CUDA 7.5

It never constructs a CUDA context prior to fork(), just cuInit().
Do you mix up the problem?

If this manner is really illegal, for example, a server process has to launch an external program to log number of GPU devices on startup time, or other trivial stuff.
I don’t think it is a reasonable restriction.

Did you read the comment I quoted from NVIDIA engineers?

It says

“We can’t initialize CUDA before fork()”

So you should not run cuInit before a fork, if you want access to CUDA in a process spawned by the fork.

You don’t have to launch an external program.

You just have to spawn a process to do what you want.

Take a look at the sample code I indicated, it does exactly that.

1 Like