Torque + Maui + CUDA How to manage the available GPUs

Magorath · December 18, 2010, 1:16pm

Hi all.

I’m currently managing a small cluster of computers through the Torque/Maui resource manager. A subset of the machines in my cluster possess GPUs. Some of them do have only one card whereas others possess many cards. What I would like to achieve is to configure Torque such that I can launch jobs which require a GPU on machines possessing one and that not more than one job is launched per card.

On the CUDA side, I’ve set all my cards in the compute mode which allows only one task at a given time. This works fine.

Now, the problem comes with the configuration of Torque/Maui. Looking at the documentation, the solution that has been tried is to use the GRES property for the computers possessing GPUs. Pretty much like node-locked licenses. When a user wants to run a GPU-job, he sends it in the “GPU-queue” (list of computers possessing GPUs) and asks for as many cards the job requires with the option “-W x=GRES:gpu@4” (without quotes) for 4 GPUs. (ゲンキンマン | クレジットカード現金化の教科書)

The problem is that this doesn’t work. The job is properly assigned to a node possessing GPUs, but the number of available GPUs on the node is not decremented. This means that when another job is launched, the same node will be used which will cause the job to crash as the cards do not accept more than one job.

Is this the correct way to configure Torque/MAUI ? If not, could you point me to a good documentation ?

Thanks in advance for any help that you could bring.

Magorath · December 20, 2010, 7:33am

No one using Torque/MAUI around ?

Magorath · December 23, 2010, 8:22am

Well. You could maybe point me towards a better to place to ask such things.

Topic		Replies	Views
Cuda + Torque + Maui? how to use queueing system with GPUs? CUDA Programming and Performance	7	21830	December 28, 2010
Torque and NVidia GPUs Help needed to configure Torque to work with NVidia GPU CUDA Programming and Performance	6	17504	April 15, 2009
How to queue CUDA tasks to a single or multiple GPU system CUDA Programming and Performance	1	4221	January 5, 2014
configuring workload manager on cluster with Nvidia Tesla s1070 CUDA Programming and Performance	4	3122	July 26, 2009
What is the good way to use MIG on a slurm cluster? CUDA Setup and Installation	2	3148	April 16, 2021
CPU Cores Per GPUs CUDA Programming and Performance	11	2455	April 14, 2013
How to do GPU allocation in N GPU + M process env CUDA Programming and Performance	6	7502	October 10, 2008
Multiple thread/process access to single GPU CUDA Programming and Performance	5	5983	May 13, 2008
How to limit number of cores in GPU to be used for processing CUDA Setup and Installation	2	2784	July 28, 2014
Sharing a GPU server for CUDA programming in a multi-user operating system CUDA Programming and Performance	4	18385	January 3, 2019

Torque + Maui + CUDA How to manage the available GPUs

Related topics