CUDA 4.0 multi-gpu auto-selection how to do it?

On our GPU cluster, we use compute exclusive mode to auto-select free GPUs when jobs are run on the nodes. This works great for single-GPU applications, these codes simply lack a cudaSetDevice() call.

I’m not seeing an obvious way for this to work in single-threaded CUDA 4.0 multi-gpu apps which must call cudaSetDevice() to switch between GPUs. Is anyone aware of a way to do this? Or is my only option to always queue jobs onto entire nodes?

Not a particularly great way to do this at the moment, but it’s easy enough to do with a for loop and cudaSetDevice.