Several of my co-workers want to give CUDA a shot, so we’re putting together a shared machine with a card. From my previous experiences, I know that performance goes down drastically when more than one CUDA job is executing simultaneously.
Since we all have the same sort of work schedule, it’s likely that we’re going to get into a situation where runs are interfering. What I want to do (and haven’t quite figured out how to do yet) is make a transparent way to ensure two people do not run at the same time, by preventing a second person from starting when a job is in progress. The reason this should be transparent is to prevent users from accidentally not conforming to the prescribed way of launching a job or initializing the API.
My suggestion, if you’re sharing the machine between a small group of close-knite developers, would be to write a simple piece of code to implement a system-wide per-GPU allocation flag of some sort. You can do this for yourself any number of ways, using anything from mmap()ed shared memory with pthreads mutexes, to schemes with files or whatever. I’d pick something that will automatically clean-up if one of your codes crashes or forgets to “release” its GPU iat exit…
If you have a queuing system setup for your box, you could simply have the queueing system set an environment variable that everyone’s code reads, and they use the GPU indicated by the queueing system. All of these are somewhat ugly hacks that depend on cooperative programming of the people in your group, but until there’s something built-in in CUDA, I don’t have any better suggestions.
I forgot to mention, if you only have a single GPU in your test box, then the solution is far simpler, just install a queueing system like grid engine and you’ll be all set. Things are only more complex if you have multiple GPU cards. (which I’m sure you’ll want before too long…)
Thanks for the reply. I enjoyed your talk at SIAM a couple of weeks ago. :) I guess my forum search skills aren’t as good as I had thought!
Well, I didn’t want to resort to building the API stuff (or using a queuing engine) because this machine will be directly accessed, so I have no way to prevent someone from accidentally launching the job without the safeguards, which is likely to happen.
I will certainly follow this other thread of discussion.
My idea would be to make a wrapper around the CUDA library by using a LD_PRELOAD interface that intercepts (for example) the cudaInit call (take a look at how socks network wrappers work)
As soon as CUDA is used in a program, set some system-wide lock (or communicates to some daemon that manages the system-wide locks, and does accounting). Make sure your plug-in library uses an onExit handler that removes the lock as soon as the program terminates.