Large virtual memory reserved and scheduling issues

jand · February 3, 2018, 9:31pm

Hi all,

I am not sure this is a good place to ask this but perhaps someone has experienced this before. I have a cuda-fortran based code that requires about 100 MB per MPI thread on the GPU (I am checking this with nvidia-smi during runtime). However, when I look at the virtual memory via top or similar, I notice that 20+ GB are reserved per thread. My understanding is that this is due to unified memory although my understanding of this is spotty.

I can run this code fine when I log into a compute node with a single K80 GPU and run it with or without MPS with no issues using mpirun for up to 26 (with MPS) and 40 (without MPS) MPI threads.

Issues start when I am trying to schedule this from a head node using slurm. Due to the large reserved memory, slurm kills the jobs with out of memory errors.

Is there a way in the compiling of this code with mpifort/pgfortran to avoid having so much virtual memory reserved? I can’t seem to find a reasonable solution with slurm.

Thanks, Jan

MatColgrove · February 5, 2018, 4:42pm

Hi Jan,

However, when I look at the virtual memory via top or similar, I notice that 20+ GB are reserved per thread. My understanding is that this is due to unified memory although my understanding of this is spotty.

I believe you’re correct in that the CUDA driver reserves virtual memory matching the size of the CPU memory plus the total amount of all GPU memory.

Unfortunately, I don’t see any CUDA documentation that shows how to control this behavior so I’m not sure what can be done about it. Let me do some research and see what I can find.

-Mat

MatColgrove · February 5, 2018, 5:39pm

Hi Jan,

I queried a few folks at NVIDIA for suggestions. Unfortunately with K80s the virtual memory usage is inherent to how CUDA unified memory works so can’t be changed.

Are you able work with your site admins to see if you can increase SLURM’s memory limits (/etc/security/limits.conf)?

Note that later NVIDIA GPUs such as Pascal do not need to reserve the virtual memory so do not have this issue.

-Mat

MatColgrove · February 5, 2018, 5:49pm

Here’s one of the responses that I got back:

It sounds like Slurm is misconfigured, either on the user-side or the configuration side.

Using --mem, or --mem-per-cpu for the job launch (srun/salloc/sbatch) may alleviate the issue.

–mem=MB minimum amount of real memory [per node]
–mem-per-cpu=MB maximum amount of real memory per allocated
cpu required by the job.
–mem >= --mem-per-cpu if --mem is specified.

Itâ€™s also possible the limits are not properly set on the node in /etc/security/limits.conf [Ubuntu]. Slurm suggests memlock and stack are set to unlimited:

soft memlock unlimited

hard memlock unlimited

soft stack unlimited

hard stack unlimited

jand · February 7, 2018, 7:50pm

Hi Mat,

thanks for looking into this. My slurm script is specified as:

#!/bin/bash
#SBATCH --time=100:00:00
#SBATCH -N 1
#SBATCH -n 4 
#SBATCH --mem=32000
#SBATCH --mem-per-cpu=1000
#SBATCH --gres=gpu:2

module load cuda
srun --mpi=pmi2 prjmh_temper_cuda_buck > ./out.log

I think that is an allowable configuration. I was not aware of the limits.conf but have since set it to the recommended values in your post (on all nodes in the cluster).

Finally, I have restarted the slurmd with these environment variables (as suggested in the slurm FAQ):
export SLURMD_OOM_ADJ=-17
export SLURMSTEPD_OOM_ADJ=-17

When I submit a job, the issue remains the same and the job gets killed with the message:

slurmstepd: Step 5713.0 exceeded virtual memory limit (83806492 > 29491200), being killed
slurmstepd: *** STEP 5713.0 CANCELLED AT 2018-02-07T11:42:54 *** on compute-0-3
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: Exceeded job memory limit
slurmstepd: Step 5713.0 exceeded virtual memory limit (83806492 > 29491200), being killed
slurmstepd: Exceeded job memory limit
slurmstepd: Step 5713.0 exceeded virtual memory limit (83806492 > 29491200), being killed
slurmstepd: Exceeded job memory limit
slurmstepd: Exceeded job memory limit
srun: got SIGCONT
slurmstepd: *** JOB 5713 CANCELLED AT 2018-02-07T11:42:54 *** on compute-0-3
srun: forcing job termination
srun: error: compute-0-3: task 0: Killed
srun: error: compute-0-3: tasks 1-3: Killed

I must have some other issue… I can’t find other people reporting this, so I suppose I have to reconsider the full slurm configuration.

Thanks, Jan

Topic		Replies	Views
High virtual memory consumption on Linux for CUDA programs: is it possible to avoid it? CUDA Programming and Performance	4	2549	November 27, 2018
Issues using CUDA on a RealTime RedHat System CUDA Programming and Performance	5	1056	December 2, 2014
Virtual memory used by a CUDA program CUDA Programming and Performance	0	1363	January 23, 2013
GPU memory usage high but per process showing low memory usage nvc, nvc++ and nvfortran	3	1163	May 31, 2023
Multi GPU, Windows 10 pagefile and global memory issues CUDA Programming and Performance	3	2274	July 31, 2018
CUDA Fortran Book Memory Allocation Error Legacy PGI Compilers	5	3995	April 18, 2020
When is 'virtual memory' available in CUDA ? CUDA Programming and Performance	7	7252	October 13, 2009
cuda fortran questions Legacy PGI Compilers	10	10962	July 27, 2012
Question about unified memory in cuda fortran Legacy PGI Compilers	3	3442	November 20, 2017
Introducing Low-Level GPU Virtual Memory Management Technical Blog	59	8197	June 4, 2024

Large virtual memory reserved and scheduling issues

Related topics