Cuda + Torque + Maui? how to use queueing system with GPUs?

Hi all.

First of all, I’m worry it this is the wrong place to ask for this information. If that’s the case, please point me right direction. ;)

Straight to the problem: We have here a cluster installed with Torque + Maui as the queuing system. It’s a common CPU cluster, But it’s getting prepared to be upgraded to a GPGPU cluster really soon. The idea is to place 2 cards per node.

The question: has anyone ever tried to deal with that? I found a few (very few) information on condor use, and also a few almost useless information on Torque. Our aim is that no one can ask for more than one single GPU core per job, and that 2 (no more no less) CPU cores are assigned for each one of those jobs. Also, if a user chooses to use only the CPU cores, it’s up to him.

Have anybody here had to deal with something like that? Any information on GPGPU usage with Torque + Maui would be very much appreciated!

Thanks a lot in advance. ;)

man nvidia-smi, check out exclusive mode. that’s pretty much what you’re interested in for GPU management

(why is it that people post something I respond to two minutes before I read the forums just about every time I read them at a weird hour? it makes me look like I’m F5ing constantly)

We had a GPU cluster which does what you are trying to do, although we use Sun Grid Engine for scheduling, not Torque, but the idea is the same.

In our set up, we keep nvidia-smi running in daemon mode and set each GPU to compute exclusive, so that it will permit at most one context per GPU. We added a consumable resource to the scheduler representing the number of GPUs per node (which is either 0 or 1 in our case), and then jobs which require a GPU specify that resource. This makes the scheduler put jobs on GPU nodes that specify they need a GPU, and queues jobs when no GPUs are free until the required number becomes available. When there are no GPU jobs on the cluster, the CUDA nodes just behave like regular nodes.

This strategy has been rather successful. I can’t tell you how to do it in Torque, but if you follow one of the recipes for floating software licenses, much of the ideas are the same.

these folks discuss technical details in surprising length for a paper:

@INPROCEEDINGS{Kindratenko:2009:GCF,
author = {Volodymyr V. Kindratenko and Jeremy J. Enos and Guochun Shi and Michael T. Showerman and Galen W. Arnold and John E. Stone and James C. Phillips and Wen{-}mei Hwu},
title = {{GPU} clusters for high-performance computing},
booktitle = {Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC’09)},
year = {2009},
month = aug,
pages = {1–8},
}

As dominik said, the NCSA folks have developped an interesting library called “CUDA wrapper” (which also deals with OpenCL actually). This makes it possible to allocate a specific number of GPUs to a MPI process by catching all CUDA calls so that they can change the device ID. For instance if you have 2 processes on a machine with a tesla rack, you get two processes that see their own GPU 0 and GPU1. I personnaly found it very convenient when i tested it on NCSA’s AC cluster.

Cédric

I’d like to follow up on the original posting here: has anyone managed to employ a Torque + Maui queue system to manage GPUs successfully?

The approaches mentioned in this discussion (i.e., use of nvidia-smi and general resources/GRES) work perfectly using Moab as the scheduler, but don’t seem to work with Maui.

Apologies if this is in an inappropriate place for such a query, but I’d be grateful to hear if anyone is using Maui to schedule GPU usage, rather than the commercial version (Moab).

“doesn’t seem to work” is a bit vague… We’d like to help, but need more information. What exactly doesn’t work? Do more GPU jobs get sent to a node then there are GPU resources? That’s a resource scheduler problem and you’d get better luck in a maui mailing list. Do multiple jobs end up running on the same GPU? Probably don’t have the exclusive mode setting sticking. You know that you need to run nvidia-smi in the background to keep the driver active, right? Otherwise it looses the settings.

Problem solved?

What is your opinion about SLURM as resource manager for this purposes?

Here is some solution:

https://computing.llnl.gov/linux/slurm/gres.html