Linux fork() and CUDA OOM possible bug ?

Hi all,
I had some CUDA OOM and discovered today the real reason.
When I have the CUDA OOM (on a new application launch) I can see that
doing “lsmod | grep nvidia” the module is used by someone but the application
is not running.
My application uses gnuplot to display in real time some data, I do this
performing a popen:
theGnuPlotPipe = popen("gnuplot -persist ", “w”);
I launch the application from a remote server so the gnuplot is running
on the server in where the GPUs are installed but the window manager
is my own computer.

When my application terminates the gnuplot process is still up, and is this
process that is using the nvidia module.

I’m perfectly aware of what there is behind a popen what is not clear to me is
why the driver is being used by an application that has never touched
any cuda allocated memory, unless gnuplot uses the nvidia driver without my


Presumably, all (or part) of your processes CUDA context is copied with the process when you fork(). There are some really old threads (CUDA 1.0 / 1.1 era) where people have probed how forked CUDA processes behave (or fail to). I haven’t seen anything more recently.

In your use-case, you really don’t want gnuplot to hold onto your app’s CUDA context at all. Try adding a call to cudaThreadExit() in your forked process, and hope that it doesn’t also destroy the context in the host process :)

mmm, that means to modify gnuplot source in my case.


If you are willing to write code which is Linux-specific, you can use the flexibility of the clone() interface in Linux to implement a popen() variant which does not required a full fork():…tion-for-linux/

(I haven’t tested this, but I would be curious to know if it fixes your problem.)

If you are amenable to using something else other than gnuplot, then try using python instead. pycuda, numpy and pylab play perfectly together without needing to worry about low level process process management at all. It is a lot more elegant than anything else I have tried.

kalman, as you probably know the popen( … ‘w’) roughly consists of creating a pipe, then a fork(), tinkering with fd’s in the child process, then doing an exec to run the shell - with the main process ultimately returning an fdopen() of the write end of the pipe. The nvidia cleanup needs to be done in the child process (after the fork, before the exec). It may not help to modify gnuplot instead, since popen() runs a shell which then runs gnuplot, so the shell will also inherit whatever it is that’s causing the problem.

If you write your own popen to handle this, the bonus is that you can exec direct to gnuplot instead of the shell. Just don’t omit the waitpid() after you finish writing to the pipe and closing it - otherwise there will be zombies. You could use clone() too, as suggested, but the fork/exec is POSIX and should be a bit more portable.

Another approach which is simpler, but may be infeasible in your app: you could popen() the gnuplot before starting doing any CUDA stuff, and keep the FILE* around until you need it. So it will inherit the pre-CUDA situation. There’s another general solution hiding in there where you fork off a process in the pre-CUDA time which you use later to popen gnuplot operations… tricky plumbing though.