configuring workload manager on cluster with Nvidia Tesla s1070

seckaka · July 25, 2009, 5:14am

Hi guys,

I am really confused :wacko:

we have - each node in cluster has 4 GPUs (nvidia tesla s1070) and 16 cores (4 quad CPU).
we want - create different queues (1) for pure CPU jobs and (2) for mixed GPU+CPU jobs.
Case (1) means that all jobs use CPU on a node.
Case (2) means that all jobs use GPU + CPU on a node.

How we can organise these by PBS or torque?
For CPU jobs we must create queue which have only 12 CPU per node. (other 4 CPU are used by GPU)
For GPU jobs we must create queue which have 4 CPU per node.

Is it necessary that 1 GPU requires 1 CPU to function or can we use 4 Virtual Processors(VP) on 2 cores to drive 4 GPU’s and
remaining 14 for CPU jobs. ?

does anyone have idea reagarding configuring workload manager on clusters have CPU+GPU combination for computing batch jobs??
I tried hard on the Internet but couldn’t get useful information…
any suggestions…thanks in advance

avidday · July 25, 2009, 10:01am

We use Sun grid engine rather than PBS/Torque, but our approach has been to treat the GPU as a consumable resource and just use a single job queue which allocates CPU cores in the standard way you would for MPI or OpenMp jobs. That way GPU jobs just become a subset of CPU jobs and run on the same queue. You can use the same basic scheduling templates that you would use for managing floating software licenses, but have the GPU as a per node resource rather than a global resource. When a node has a free GPU resource and CPU core, it will accept a new GPU job if one is queued. When it has only a free CPU core, it will accept new CPU jobs, and GPU jobs will sit in the job queue until a node with both a free CPU core and a free GPU resource becomes available. When there are no GPU jobs on the queue, nodes just process CPU jobs as if the GPUs don’t exist.

seckaka · July 25, 2009, 10:44am

We use Sun grid engine rather than PBS/Torque, but our approach has been to treat the GPU as a consumable resource and just use a single job queue which allocates CPU cores in the standard way you would for MPI or OpenMp jobs. That way GPU jobs just become a subset of CPU jobs and run on the same queue. You can use the same basic scheduling templates that you would use for managing floating software licenses, but have the GPU as a per node resource rather than a global resource. When a node has a free GPU resource and CPU core, it will accept a new GPU job if one is queued. When it has only a free CPU core, it will accept new CPU jobs, and GPU jobs will sit in the job queue until a node with both a free CPU core and a free GPU resource becomes available. When there are no GPU jobs on the queue, nodes just process CPU jobs as if the GPUs don’t exist.

Thank you Mr.avidday…that cleared half of my confusion… External Media

could you please help me regarding this…

I have also read on internet that…some sort of soft locks has to be set on GPU’s to prevent misuse by users in multi-user environment…

and workload managers like PBS/Torque doesn’t take care of Allocation of GPU’s

If we run two simultaneous cuda jobs…will they be executed on two different GPU’s or overlaps one over the other…

Is there any provision to set the Device no of GPU on which my job has to run…

Thanks in advance…

avidday · July 25, 2009, 12:58pm

We don’t have that problem because we presently only have a single GPU per node, so that isn’t something I have personal experience. Somebody with a cluster of S1070s will have to help you there (or Massimo Fatica, who works for NVIDIA and posts here - he seems to be their compute cluster guru).

Having said that, I understand that the nvidia-smi utility has the ability to configure an S870/S1070 so that the cards go into “compute” exclusive mode, where the driver will only permit a single process per physical GPU. If a user tries to run on a GPU which is already in use when in compute exclusive, the program will fail to launch. That, combined with your scheduler and a bit of user discipline, should probably work in most circumstances.

seckaka · July 26, 2009, 4:04am

That solved most of my problems Mr.avidday…Thank you

I searched in the internet for GPU compute exclusive mode what you have mentioned and found this link

https://www.wiki.ed.ac.uk/display/ecdfwiki/…-Exclusive+Mode

which is really helpful…

Mr.avidday you made my day…

Topic		Replies	Views
Cuda + Torque + Maui? how to use queueing system with GPUs? CUDA Programming and Performance	7	21929	December 28, 2010
manage jobs in multi-gpu system with compute exclusive mode or not CUDA Programming and Performance	14	4232	September 3, 2010
Torque and NVidia GPUs Help needed to configure Torque to work with NVidia GPU CUDA Programming and Performance	6	17577	April 15, 2009
Job Scheduler/Queue Class for multiple GPUs has anyone implemented one? CUDA Programming and Performance	4	1864	August 25, 2008
How to queue CUDA tasks to a single or multiple GPU system CUDA Programming and Performance	1	4358	January 5, 2014
Replacement for SGE grid ? CUDA Programming and Performance	7	5108	July 7, 2009
Exclusive Mode and More CPUs Than GPUs Can I overschedule GPUs in exclusive mode? CUDA Programming and Performance	3	22000	June 16, 2010
Torque + Maui + CUDA How to manage the available GPUs CUDA Programming and Performance	2	2331	December 23, 2010
Using GPUs on high performance machines CUDA Programming and Performance	4	1102	February 8, 2013
questions about constructing supercomputer based on Tesla S1070 CUDA Programming and Performance	6	2953	May 17, 2009

configuring workload manager on cluster with Nvidia Tesla s1070

Related topics