I am not sure this is a good place to ask this but perhaps someone has experienced this before. I have a cuda-fortran based code that requires about 100 MB per MPI thread on the GPU (I am checking this with nvidia-smi during runtime). However, when I look at the virtual memory via top or similar, I notice that 20+ GB are reserved per thread. My understanding is that this is due to unified memory although my understanding of this is spotty.
I can run this code fine when I log into a compute node with a single K80 GPU and run it with or without MPS with no issues using mpirun for up to 26 (with MPS) and 40 (without MPS) MPI threads.
Issues start when I am trying to schedule this from a head node using slurm. Due to the large reserved memory, slurm kills the jobs with out of memory errors.
Is there a way in the compiling of this code with mpifort/pgfortran to avoid having so much virtual memory reserved? I can’t seem to find a reasonable solution with slurm.