Multi-user-systems und multi-gpu-usage

thilo · July 14, 2008, 10:09am

Hi,

we have a Tesla S870 in a multi-user-system and I don’t really know, how the access to the device is managed.

I know from Programming Guide, that is possible to use cudaSetDevice() to choose a GPU.
Do i have to tell everybody to use another device to share the resources of the S870?

Is it possible to use the same GPU by to different users?
Are there some integrated functions to contol user access ( Does the thread scheduler manage multi useraccess?)?

And finaly is there a kind of self-management that makes it possible to use all of the four GPUs without explicit calling?

Best Regards
thilo

jordyvaneijk · July 14, 2008, 11:12am

I think it is possible to have multiple users. all those users can make use of the 4 Tesla’s but only one at a time. Lets say user1 uses Tesla1 user2 can use 2-4 to run his application on. It is not possible to run more kernels at once on one GPU so you need to manage that by yourself.

I should say Use 1 Telsa per user and tell the user he is only allowed to use his own Tesla.

thilo · July 14, 2008, 11:22am

Thank you for your quick answer!

That’s a pity.
Do you know if there exists some stack or something if two users try to launch kernels on the same GPU?

Unfortunately the chapter about device management in the Programming Guide is very short.
Do you know some other literature about this problem?

jordyvaneijk · July 14, 2008, 11:35am

If I were you I just write to simple test programs and run them both at the same time on the same GPU to see what the outcome will be.

MisterAnderson42 · July 14, 2008, 12:19pm

More than one process can access the same GPU at the same time. As long as the total memory allocated doesn’t exceed the free memory on the card, all apps will execute perfectly fine but at a much lower performance. So for testing and debug purposes, everybody using the same GPU isn’t a problem. But for performance tuning/real application execution you definitely want one process only on each GPU.

cudaSetDevice is the only tool you have to manage this :( Better tools have often been requested: here is hoping for them in CUDA 2.1.

In a production environment, job queues such as the sun grid engine or OpenPBS (normally tools used in cluster job sheduling) could be configured to schedule jobs onto GPUs.

But in a programming/test environment, communication between developers is probably the best way as setting up a PBS job script for every execution you want to debug would get tedious. One suggestion might be to leave GPU 0 as the test/debugging GPU and leave the other 3 for performance testing. Add a command line option for choosing the GPU early in development to make switching easy.

I’ve considered writing a gputop program that would let users know what GPUs are currently in use (there is a hackish way using lsof), but I haven’t gotten around to it.

thilo · July 14, 2008, 12:30pm

@jordyvaneijk:
That’s for sure. I tried some little programm to run in parallel an it worked. But I can’t draw my conlcusion only from a loss of speed about the organization of user management.

Perhaps its possible to find out this information with another test programm. But I think it’s little more difficult to write such a programm than to ask somebody.

@MisterAnderson42:
Thanks, I will hope for the best in new CUDA versions.

Best Regards
thilo

jordyvaneijk · July 14, 2008, 12:39pm

Ok but that is very difficult. You cannot see how much memory is already allocated by someone else. The only way to do that is by calling that function that will check how much memory is still available on the GPU. The applications I’m developing are all using a lot of memory and therefore it is not possible for someone else to use the GPU. How would you get around that?

kristleifur · July 14, 2008, 1:09pm

Thinking out loud -

Maybe an approach to the problem:
= Try to find out - before running the calculations - how much memory the task needs. Then you can allocate memory all at once, and use the available memory of the card as a sort of semaphore. Perhaps.
= An architecture that runs CUDA code asynchronously will probably utilise the machine+cards better - when the cards are used for many things at a time.

Also:
= There are probably some utilities out there to manage batch processing on ordinary clusters. Something that can manage batch processing on a 4-computer cluster can probably be adapted for job control on a 4-GPU “cluster”.

Thirdly:
= Is there any chance to get each developer a cheaper CUDA card to develop on? They don’t have as much memory, but it may buy the administrator some time to get resource sharing things settled by relieving pressure on the TESLA machine.

Good luck!

MisterAnderson42 · July 15, 2008, 12:01am

In principle, an app already checks for error returns from every cudaMalloc. Thus the application will somewhat gracefully fail with an out of memory error message.

jordyvaneijk · July 15, 2008, 8:21am

Ok thank you for this reply

Topic		Replies	Views
Sharing a GPU server for CUDA programming in a multi-user operating system CUDA Programming and Performance	4	18365	January 3, 2019
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12028	February 23, 2009
Programming Multiple GPUs CUDA Programming and Performance	2	982	April 29, 2010
How do you handle device login? CUDA Programming and Performance	2	5913	March 23, 2011
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3023	November 13, 2007
Multiple GPU computing CUDA Programming and Performance	8	7878	May 7, 2008
Multi GPU question CUDA Programming and Performance	7	5141	August 10, 2009
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10007	April 22, 2009
cuda API request: cudaSetDeviceLeastUsed() CUDA Programming and Performance	3	2693	April 3, 2008
GTX295 multi GPU programming CUDA Programming and Performance	22	10655	July 9, 2009

Multi-user-systems und multi-gpu-usage

Related topics