cuda API request: cudaSetDeviceLeastUsed()

seibert · March 31, 2008, 6:14pm

We’ve recently moved our second 8800 GTX into the same server as our first GTX card, and we need a way to ensure concurrent CUDA jobs don’t use the same card.

I’d like to request a new Device Mangement function to make this easier:

cudaError_t cudaSetDeviceLeastUsed(int *dev);

Sets the active host thread to run device code on the CUDA device which is being used by the fewest host processes.  The selected device number is returned in *dev.

This function is atomic, so if there are N available devices and N processes call cudaSetDeviceLeastUsed() at the same time, they are all guaranteed to be assigned to different devices.

For the time being, were are adding this logic into the program we run most frequently, but a global way to do this right for all CUDA jobs would be nice.

tachyon_john · April 3, 2008, 2:51am

Hi,

Over the last few months I’ve requested (in the NVIDIA bug/RFE system) a number CUDA features of along these lines to assist with cluster-based CUDA runs. As you can imagine, things get interesting when multiple independent CUDA jobs get scheduled onto the same node, and they all want to use all of the CUDA devices :-)

In my various feature requests, I have asked for implementation of a couple of alternative things:

an exclusive-open call, such that the device will not show up as available to other jobs
calls to determine how “busy” or active a GPU is
calls to reserve GPU resources, e.g. global memory, so that when two jobs do share a device, they can reserve the amount of memory they need up front, and prevent other jobs from interfering with their operation, once started.
methods for settings limits akin to what one does in Unix for the CPU/memory/stack/etc, but for the GPU. Ideally something that can interact with a queueing system, for example.

These were just my crude suggestions, I don’t know how practical any of them are, we’ll see what the NVIDIA engineers come up with. At the very least they now know that others besides those of us at UIUC also want these things :-)

Cheers,

John Stone

We’ve recently moved our second 8800 GTX into the same server as our first GTX card, and we need a way to ensure concurrent CUDA jobs don’t use the same card.

I’d like to request a new Device Mangement function to make this easier:
cudaError_t cudaSetDeviceLeastUsed(int *dev);

Sets the active host thread to run device code on the CUDA device which is being used by the fewest host processes.  The selected device number is returned in *dev.

This function is atomic, so if there are N available devices and N processes call cudaSetDeviceLeastUsed() at the same time, they are all guaranteed to be assigned to different devices.
For the time being, were are adding this logic into the program we run most frequently, but a global way to do this right for all CUDA jobs would be nice.

[snapback]354293[/snapback]

seibert · April 3, 2008, 1:50pm

Ah, those are even better suggestions. The exclusive open call would exactly solve our problem.

I have not experimented much with multiple user processes running on the same CUDA device. Early on with CUDA 0.8 I tested this briefly and found that two processes using the same device ran much, much slower than you would expect (this was a dual-core system with a single 8800 GTX). I haven’t tried again since that time, though.

tachyon_john · April 3, 2008, 4:10pm

We hadn’t intended to be experimenting with multiple users sharing cards, but we quickly ran into this once a larger number of people began using the GPU clusters here at UIUC and UNC Chapel Hill… :-)

Without some sort of exclusive access API, the only way to avoid this problem to set the queueing system such that users are only able to allocate entire nodes at a time. On the UIUC cluster that’s not really an option though, as the same nodes contain both GPUs and FPGAs, so there are different people running jobs on the different accelerator devices, making things slightly more wacky…

John

Topic		Replies	Views
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3463	March 10, 2011
Language confusion with multi-gpu CUDA Programming and Performance	11	19295	October 30, 2007
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10007	April 22, 2009
Multi-user-systems und multi-gpu-usage CUDA Programming and Performance	9	6205	July 15, 2008
How to query device #s of available GPU devices? CUDA Programming and Performance	14	24076	May 5, 2009
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3023	November 13, 2007
On which device are __device__ variables allocated? CUDA Programming and Performance	21	6447	March 13, 2009
Device Memory Mangement CUDA Programming and Performance	14	3440	December 5, 2008
unable to get the cpu and gpu to run in parallel CUDA Programming and Performance	34	23199	October 7, 2010
Sharing a GPU server for CUDA programming in a multi-user operating system CUDA Programming and Performance	4	18365	January 3, 2019

cuda API request: cudaSetDeviceLeastUsed()

Related topics