Finding Idle GPU in Multi-GPU System

A couple issues I’m trying to solve.

A) If I want to use the full resources of a multi-gpu system, is there any way to determine dynamically which GPUs I’m already using without explicitly tracking device id?

i.e. start one thread, it picks the first available device and executes, a second thread starts, how can it pick the next gpu

Ideally it would be nice to launch a thread and have it scheduled to an idle device.

B) Similar idea, can I track which device is being used as a primary display and avoid using it?

AFAIK, you’ll need to write your own code to track GPU use and allocation. I’ve been designing something to do just this - it’s not too tough.

Create an array (one element per GPU) that tracks GPU use. An array of booleans would be enough. Provide “getGPU()” and “releaseGPU()” functions that your threads can call as needed. The functions update the array and call cudaSetDevice() and cudaThreadExit(). Make sure the functions are protected by a common mutex to avoid race conditions.

You need to think about how to handle the condition when all GPUs are in use. My design blocks the calling thread until a GPU is available, but your code may have different needs.

I’m not sure how you’d determine which device is the primary display - I’m working on a headless system. Maybe someone else can help you there.

Hi,

I filed a CUDA feature request (# 298834) along these lines many moons ago, but I’m not sure where it stands. It’s undoubtedly one of hundreds of such feature requests. I’m sure the NVIDIA staff will get to this at some point when they’ve dealt with more pressing issues and more highly-requested features.

Cheers,

John Stone