How to query device #s of available GPU devices?

Seppo_Sahrakorpi · March 31, 2009, 8:16pm

Hi all,

How can I query the available CUDA GPU device numbers in linux environment? That is, knowing the total number of available devices is not enough (cudaGetDeviceCount), I need the CUDA API to tell me the actual available device numbers.

The above would be needed in our multi-user CPU-GPU cluster w/ several CPU compute hosts w/ 2 GPUs each. We need to accommodate hybrid MPI+CUDA codes, where the MPI process on a compute host has to know which GPU device to use for CUDA computing, when the other device could be taken by some other user.

As a follow-up to the above, has anyone done SGE + CUDA queue and PE configuration beyond simple all.q? We would want to allocate SGE resources based on available GPUs, and pin / reserve a GPU device to a SGE job instance.

Thanks,
Seppo

jack · March 31, 2009, 10:34pm

Once you have the count of devices, you can call cuDeviceGet() (if you’re using the driver api…check the reference for the runtime call) to get a pointer to to a specific device within the range [0, X], where X is the number returned by the cuDeviceCount() method. Once you have the device pointer, you can call cuDeviceGetName() with it to get the name of the device, or ceDeviceGetProperties() to get it’s other properties. You can do those last few steps in a loop after you get the device count if you want to get the information for all of the devices in the system.

tmurray · March 31, 2009, 10:39pm

Just wait for 2.2 final…

computerulz · March 31, 2009, 11:08pm

Does that mean Seppo nailed it? The thing that is better than any of the other 2.2 improvements but won’t appear until 2.2 final?

tmurray · March 31, 2009, 11:33pm

Yeah. In 2.2 under Linux, you can use nvidia-smi to designate a GPU as supporting multiple contexts, a single context, or no contexts. You can query this in CUDART, plus we give you some convenience features to make this easy. So, you have multiple GPUs and multiple MPI processes that need GPUs–no problem. Set all your GPUs to single context mode (aka compute exclusive mode), don’t call cudaSetDevice() (or you call the new function to set the valid list of possible devices), and run your app. One process will grab the first GPU, the other will try the first GPU and fail because the context already exists, then (assuming you use CUDART) it will silently retry and create a context on the next GPU. Once you’re out of GPUs, context creation will fail.

In other words, all of the problems that Seppo mentioned just go poof and disappear.

computerulz · March 31, 2009, 11:38pm

Yeah. In 2.2 under Linux, you can use nvidia-smi to designate a GPU as supporting multiple contexts, a single context, or no contexts. You can query this in CUDART, plus we give you some convenience features to make this easy. So, you have multiple GPUs and multiple MPI processes that need GPUs–no problem. Set all your GPUs to single context mode (aka compute exclusive mode), don’t call cudaSetDevice() (or you call the new function to set the valid list of possible devices), and run your app. One process will grab the first GPU, the other will try the first GPU and fail because the context already exists, then (assuming you use CUDART) it will silently retry and create a context on the next GPU. Once you’re out of GPUs, context creation will fail.

In other words, all of the problems that Seppo mentioned just go poof and disappear.

That sounds brilliant! Just what the doctor ordered. Pity I’m currently required to use CUDA in windows, but I’m sure I’ll go back to linux at the first opportunity.

MisterAnderson42 · April 1, 2009, 12:00am

Cool! We’ve asked and asked and you guys have finally delivered :) And in a way that will make cluster admins very happy. I’m looking forward to tearing down all the ad-hoc and poorly debugged scripts that implemented this functionality client-side.

eyalhir74 · April 1, 2009, 8:29am

Great !!! :)

Btw - you say

Does this also apply to a single process with multiple threads, each using a different GPU? and of course will also work on GTX295 as well?

thanks

eyal

tmurray · April 1, 2009, 3:55pm

yes and yes. doesn’t actually matter if the two contexts are created from the same thread and just push/popped with context migration APIs, different threads in the same process, or different processes–it’s just a restriction on the number of contexts that can exist on a GPU at a time.

bitsurge · April 2, 2009, 10:42pm

Is similar functionality available in the driver API? I.e. if a GPU is set to a single context, will cuCtxCreate() fail on that device? I suppose I can just try the next device until I either run out or find one?

Thanks,

mch

tmurray · April 2, 2009, 11:08pm

cuCtxCreate will fail. There’s no syntactic sugar in the driver API to handle multiple devices, but it behaves exactly as you would expect.

Jeremy_Enos · April 9, 2009, 8:28pm

Since CUDA 1.1, we’ve been rolling our own solution too, but it’s had some other advantages. We created a wrapper library that only exposes the GPUs allocated to that user by a scheduler/batch system, but it is probably best described just by posting a relevant section from the user readme below.

CUDA Wrapper USER readme

Overview:

The CUDA wrapper library is typically implemented in a forced preload, such

that the device allocation calls to CUDA are intercepted by it for a few

different benefits. Only users requesting multiple GPUs per node really need

to be aware of it’s transparent operation. The wrapper libary accomplishes

three things:

Virtualizes the physical GPU devices to a dynamic mapping, that is always

zero indexed. The virtual devices visible to the user map to a consistent

set of physical devices, which accomplishes “user fencing” on shared

systems and prevents users from accidentally trampling one another.
Rotates the virtual to physical mapping for each new process that requests

a GPU resource. This provides a method for large parallel tasks to use

common startup parameters and still use multiple device targets. i.e.

When each new process calls for gpu0, the underlying physical device gets

shifted, or rotated if you will, allowing for the next process calling for

gpu0 to get the next allocated physical device. Please note that rotation

does not occur for new threads within a single process- only for new

processes. CAUTION Users accustomed to targeting gpu0, gpu1, etc with

different processes on systems without this wrapper must understand this

feature to avoid trampling their own processes. e.g. If you have two

GPU devices allocated, and you launch two processes, one targeted to gpu0,

and the other targeted to gpu1- both processes will be using the same gpu

device! Call them each against gpu0 unless they’re different threads

within a single process.
NUMA affinity, if relevant, can be mapped between CPU cores and GPU

devices. This has been shown to have as much as 25% improvement in host

to device memory bandwidth. This feature is transparent.

There is a link to download it on this page (search for CUDA Wrapper Library):

http://www.ncsa.uiuc.edu/Projects/GPUcluster/

Also included in it is a memory scrubber utility, which we run between user jobs so that userB can’t read out whatever userA left in the GPU memory.

Jeremy Enos

NCSA

tmurray · April 9, 2009, 9:10pm

Yeah, Jeremy’s library is a big part of why we’ve gone with exclusive mode as opposed to something else–it seems to work well and people like it. Exclusive mode gets you #1 and #2 easily enough. #3 is coming in a future driver release.

Guillermo_Andrade · May 5, 2009, 3:50pm

Yeah. In 2.2 under Linux, you can use nvidia-smi to designate a GPU as supporting multiple contexts, a single context, or no contexts. You can query this in CUDART, plus we give you some convenience features to make this easy. So, you have multiple GPUs and multiple MPI processes that need GPUs–no problem. Set all your GPUs to single context mode (aka compute exclusive mode), don’t call cudaSetDevice() (or you call the new function to set the valid list of possible devices), and run your app. One process will grab the first GPU, the other will try the first GPU and fail because the context already exists, then (assuming you use CUDART) it will silently retry and create a context on the next GPU. Once you’re out of GPUs, context creation will fail.

In other words, all of the problems that Seppo mentioned just go poof and disappear.

Hello

Until 2.2,

you can watch http://forums.nvidia.com/index.php?showtopic=90728

with a good solution that we have tested with SGE. With this, all you need is a termination script or clean script in the case of our process has aborted.

Best regards,

Guillermo Andrade

tmurray · May 5, 2009, 5:28pm

actually exclusive mode is available right now (before 2.2 final toolkit comes out) using nvidia-smi and 185.18.04.

Topic		Replies	Views
Can I query the number of CUDA contexts per GPU? CUDA Programming and Performance	1	4533	February 2, 2008
Multiple GPUs: finding one that's not busy CUDA Programming and Performance	3	1944	September 3, 2008
Multi GPU question CUDA Programming and Performance	7	5179	August 10, 2009
Get Cuda device unique ID Is it possible Get Cuda device unique ID? CUDA Programming and Performance	1	5548	December 12, 2010
query which devices are in use? CUDA Programming and Performance	1	2582	August 3, 2010
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10063	April 22, 2009
Managing multiple GPUs from a single host thread CUDA Programming and Performance	1	1230	October 10, 2010
compute-exclusive mode and cudaGetDevice(...) Always claims to be running on device 0. CUDA Programming and Performance	6	15945	July 27, 2009
Asking a gpu how busy it is? CUDA Programming and Performance	3	1244	September 3, 2009
CUDA accessing ALL devices, even those which are blacklisted CUDA Programming and Performance	9	7494	October 17, 2014

How to query device #s of available GPU devices?

CUDA Wrapper USER readme

Related topics