How to do GPU allocation in N GPU + M process env

Can anyone tell me how, for example, I ensure that on a two (or more) cpu box with a four GPU TESLA card attached that, out of two processes running, that each process uses a free GPU and not a GPU on which the kernel of the other process is already executing? Can I scan for free GPU’s and ‘lock’ one for my use? I don’t want to be in the situation where I have processes’ kernels being serialised while GPU’s go idle.


Interestingly, CUDA APIs do NOT support this. You just cannot figure out who is using what.

There have been attempts by people in forum to come up with some algorithms to figure out which GPU is in use (by estimating the % of free CUDA memory etc…)

but still, there is no formal fool-proof mechanism to lock and allocate a GPU for a CUDA operation – AFAIK. Not sure if things have changed much with CUDA 2.0

Went through the CUDA 2.0 doc, things haven’t changed in this direction :(

While stuffing GPUs after GPUs inside a TESLA box or the X2s or whatever, NVIDIA has to provide a way to allocate a free GPU…

Come on… This is the most basic thing that any programmer would expect to find in an API.

I dont think this is a big thing to implement.

When can we expect this support in the driver? Is this in the pipeline? Kindly throw some light on this. More and more people are getting annoyed by the lack of such a basic support…

I wrote a python script that scans the output of lsof /dev/nvidia*. CUDA apps open the nvidia device they use with the “mem” descriptor. The python script returns a free GPU # which is then passed to the CUDA application on the command line. To avoid race conditions, GPUs are considered “reserved” for 30s allowing time for the program to initialize and acquire the GPU before another program starts running. Jobs are run with the sun grid engine.

Since X opens /dev/nvidia* with the mem descriptor too, this method only works on boxes with X disabled.

Another option is to run a job scheduler (like the sun grid engine) and create a number of resources GPU1, GPU2, GPU3, … and have jobs written to specifically request a specific GPU. Unless all your jobs are roughly equal time, you may end up with a pile of jobs waiting for GPU2 while all the GPU1 jobs have finished, though.

Neither of these situations is ideal, but that is really all we have to work with. There have been feature requests on file with NVIDIA since CUDA 0.8 to add something into the API to solve this problem, but nothing has ever come of it.

One other simple idea people have suggested is to check the amount of free memory on the card with cuMemGetInfo to determine if it is used or not. You can try it, but race conditions can easily lead to two programs on the same GPU.

You could write a small scheduler and and use GPUWorker class (search on the forum).

Thanks for the suggestions.

The feedback from Nvidia has been that there are currently two possible options - both also suggested here: use a custom daemon process to arbitrate access or, as suggested here, use the /dev/nvidia*/mem info on Linux.

The obvious thing to do would be to provide this in the API and implement it in the driver. I’m told that they’ve had a number of requests for this and I’ve added my name to that list. Hopefully something will come of that but in the meantime

I’ll be investigating implementing a simple library which implements the arbitration logic using a shared mem segment to record GPU allocation. Really need some way to kill a running kernel and re-init a GPU as well though!