MPI causing trouble in memory allocation?

chrismc · November 28, 2009, 5:09pm

I have developed some mixed CUDA/MPI code which I hope to run on a cluster of S1070s, but I have a problem with either MPI or CUDA, and I don’t know which!

On our S1070 devices 0,2,3,4 are C1060s and when I try run my code over the four devices device 3 when queried with cuMemGetInfo reports 0% free memory, but then allocates some variables but quickly runs out of device memory. The other three devices all report 99% free memory before allocation, allocate all variables without error and report that after allocation 42% memory is free.

I also have access to a cluster of S1070s at a government research lab and have been trying to get this same code working on that too, to eventually run the code over the whole cluster. Here’s the strange thing. Exactly the same thing occurs on their S1070s, but… on their S1070s devices 0,1,2,3 are C1060s (on ours it’s devices 0,2,3,4) but the memory error always always always occurs on device 3! device 3 on our S1070 and device 3 on their S1070.

Any suggestions as to what is occuring and how to fix this would be welcome because this is very frustrating.

avidday · November 28, 2009, 5:56pm

As I suggested in the other thread you posted on this subject, get a sysadmin to set up nvidia-smi to keep all the GPUs you are using in compute exclusive mode. This will ensure that you are only getting a single MPI process per GPU and if you have messed up the process-affinity/gpu-relationship (which sounds likely), it will make your program fail and give you some clues about where things are going wrong.

chrismc · November 28, 2009, 8:21pm

I really appreciate you answering my queries on this CUDA/MPI stuff, but could you explain further

what is compute exclusive mode?
if I am the only user of our S1070 in our dept (not the cluster) why is it that three devices appear to function OK but the fourth one does not?
where is info on how to use nvidia-smi for our sysadmin?

Again, many thanks for trying to help me.

avidday · November 28, 2009, 8:45pm

Compute exclusive mode is a mode which you can put the driver into so that no more than one CUDA process can be allocated to a given GPU under its control. If you try and run two processes on the same GPU, you will get a “no device available” style error.

As for nvidia-smi, just about everything is covered in this thread, It also prints out its usage:

avid@cuda:~$ nvidia-smi -h

nvidia-smi [OPTION1] [OPTION2 ARG] ...

NVIDIA System Management Interface program for Tesla S870

	-h, --help								  Show usage and exit

	-x, --xml-format							Produce XML log (to stdout by default, unless

												a file is specified with -f or --filename=FILE

	-l, --loop-continuously					 Probe continuously, clobbers old logfile if not printing to stdout

	-t NUM, --toggle-led=NUM					Toggle LED state for Unit <NUM>

	-i SEC, --interval=SEC					  Probe once every <SEC> seconds if the -l option

												is selected (default and minimum: 1 second)

	-f FILE, --filename=FILE					Specify log file name

	--gpu=GPUID --compute-mode-rules=RULESET	Set rules for compute programs

												where GPUID is the number of the GPU (starting at zero) in the system

												and RULESET is one of:

												0: Normal mode

												1: Compute-exclusive mode (only one compute program per GPU allowed)

												2: Compute-prohibited mode (no compute programs may run on this GPU)

	-g GPUID -c RULESET						 (short form of the previous command)

	--gpu=GPUID --show-compute-mode-rules

	-g GPUID -s								 (short form of the previous command)

	-L, --list-gpus

	-lsa, --list-standalone-gpus-also		   Also list standalone GPUs in the system along with their temperatures.

												Can be used with the -l, --loop-continuously option

	-lso, --list-standalone-gpus-only		   Only list standalone GPUs in the system along with their temperatures.

												Can be used with the -l, --loop-continuously option

That should be enough to get you going.

It really sounds like you don’t have a handle on the process number-cpu-gpu affinity and something is going wrong. Have a look at the link I posted in your other thread about using MPI_comm_split() and colours to explicitly control the process-gpu affinity.

chrismc · November 28, 2009, 9:17pm

I was using contexts, but when I replaced cuCtxCreate with cudaSetDevice it worked!

I did that because of that code snippet from mfatica (which would not compile anyway).

So why do contexts not work in this case?

avidday · November 28, 2009, 9:22pm

No idea, I am afraid. I haven’t used the driver API all that much.

I must admit that in our cluster we have a very simple 1:1 node-gpu relationship and I have never really had much trouble with getting MPI-CUDA hybrid codes working using pretty simple code and the runtime API. Glad to hear you got it working, though.

Topic		Replies	Views
Exclusive compute mode doesn't work with multiple GTX295's & 64-bit Linux CUDA Programming and Performance	2	2710	September 17, 2009
Exclusive Mode and More CPUs Than GPUs Can I overschedule GPUs in exclusive mode? CUDA Programming and Performance	3	21979	June 16, 2010
multi gpu + exclusive mode + matlab, can't run two processes - kernel crashes CUDA Programming and Performance	39	9276	July 1, 2010
Using a cluster of S1070s CUDA Programming and Performance	1	1141	November 27, 2009
Unable to set exclusive compute mode using nvidia-smi CUDA Programming and Performance	5	48671	May 20, 2009
CUDA+MPI Device CUDA Programming and Performance	0	1572	April 28, 2010
Compute modes not adequate, misleading? Suggestions about compute modes CUDA Programming and Performance	0	2299	June 4, 2009
Multi GPU question CUDA Programming and Performance	7	5179	August 10, 2009
Multi-User GPGPU CUDA Programming and Performance	8	1445	September 10, 2010
about running cuda on a gpu cluster CUDA Programming and Performance	25	21644	May 31, 2010

MPI causing trouble in memory allocation?

Related topics