More than one process can access the same GPU at the same time. As long as the total memory allocated doesn’t exceed the free memory on the card, all apps will execute perfectly fine but at a much lower performance. So for testing and debug purposes, everybody using the same GPU isn’t a problem. But for performance tuning/real application execution you definitely want one process only on each GPU.
cudaSetDevice is the only tool you have to manage this :( Better tools have often been requested: here is hoping for them in CUDA 2.1.
In a production environment, job queues such as the sun grid engine or OpenPBS (normally tools used in cluster job sheduling) could be configured to schedule jobs onto GPUs.
But in a programming/test environment, communication between developers is probably the best way as setting up a PBS job script for every execution you want to debug would get tedious. One suggestion might be to leave GPU 0 as the test/debugging GPU and leave the other 3 for performance testing. Add a command line option for choosing the GPU early in development to make switching easy.
I’ve considered writing a gputop program that would let users know what GPUs are currently in use (there is a hackish way using lsof), but I haven’t gotten around to it.