GPU memory usage high but per process showing low memory usage

natanhoffmann · May 31, 2023, 1:47pm

Hello,

I have a CFD solver that is accelerated with OpenACC and parallelized with MPI. Without going into too much detail, let’s say per GPU that I have, I am able to have 16 million cells in my domain without any slowdown issues, IF I am running a single GPU simulation. If I move to multi-GPU, I am seeing significant slowdown. I thought I tracked down the issue to allocating too much private memory in a specific subroutine in my code. It was fixed for a bit, but now the problem came back.

I am very confused. If I check nvidia-smi, I see the following:

My application is bin/dew. In the GPU memory usage, it is only showing 3034MB per GPU. But for some reason the total memory usage above, is close to full capacity. I am really unsure what is going on.

Any advice appreciated!

MatColgrove · May 31, 2023, 3:00pm

Hi Natan,

Does your program use CUDA Unified Memory, i.e. you compile with -gpu=managed?

UM wont show up as part of the program’s memory but will in the the total memory.

Also, UM can oversubscribe GPU memory, so if you use more than what’s available it will get paged back to the host. While convenient, it can cause slow downs if the memory gets paged back and forth.

-Mat

natanhoffmann · May 31, 2023, 3:03pm

Hi Mat,

Thanks for your response. Yes, I use Unified Memory. I think what you described is happening. So my (easier) options are to decrease the total memory that I allocate globally or to decrease my mesh size? The problem is likely not related to private memory allocation?

MatColgrove · May 31, 2023, 4:31pm

So my (easier) options are to decrease the total memory that I allocate globally or to decrease my mesh size?

Difficult for me to say since I don’t know your code. Though consider running the code through Nsight-Systems to get a profile to better understand how memory is affecting your performance.

The problem is likely not related to private memory allocation?

Likely, but again I don’t know for sure.

In general, I much prefer manually managing data and not using UM for MPI codes. CUDA Aware MPI can’t currently take advantage of GPU direct communication when using UM. If possible, you may consider spending the time adding data directives as well as host_data directives around your MPI calls. Granted, if the program doesn’t have a lot of MPI communication, it may not matter, so it’s up to you if it’s worth the time investment.

Topic		Replies	Views
Out of Memory nvc, nvc++ and nvfortran	5	855	October 13, 2023
Memory usage values in nvidia-smi command CUDA Programming and Performance	3	2149	November 21, 2023
Ways to reduce GPU memory usage Legacy PGI Compilers (archived)	1	2137	February 8, 2016
Unified memory - more than 1 GPU Legacy PGI Compilers (archived)	5	2854	January 17, 2019
OpenACC Multi GPU Memory Informations nvc, nvc++ and nvfortran	5	480	January 31, 2024
GPU memory problem when running pmemd.cuda.MPI CUDA Programming and Performance cuda	2	660	January 19, 2024
Unified Memory: nvidia-smi "Memory Usage" interpretation CUDA Programming and Performance cuda	10	17085	March 6, 2026
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2245	February 18, 2015
Multi GPU and utilization Nsight Visual Studio Edition cuda	0	529	October 27, 2021
Beyond GPU Memory Limits with Unified Memory on Pascal Technical Blog	15	1146	March 11, 2022

GPU memory usage high but per process showing low memory usage

Related topics