On our GPU cluster, we use compute exclusive mode to auto-select free GPUs when jobs are run on the nodes. This works great for single-GPU applications, these codes simply lack a cudaSetDevice() call.
I’m not seeing an obvious way for this to work in single-threaded CUDA 4.0 multi-gpu apps which must call cudaSetDevice() to switch between GPUs. Is anyone aware of a way to do this? Or is my only option to always queue jobs onto entire nodes?