Memory usage values in nvidia-smi command

hnts03 · November 21, 2023, 11:19am

Hi, I’m studying “Unified Memory” with CUDA programming now.
In the tutorial of link below, I got some questions about memory usage of gpus.

At the First sample code(search w/ grid_cuda.cu), the variables in the code consist x and y which are float pointer to make float array.
Refer to code, the size of x and y is same with N(1<<20 == 1M) so that their total size should be 4MB for the array x and 4MB for the array and it uses total 8MB memory space of host side.
If it is true, I can’t understand why this memory Usage printed on nvidia-smi command.(I know that it is not accurate information but it’s true nvidia-smi command checks the current features of gpu naively. Isn’t it?)

First question is why the memory usage at the middle is printed as 522MiB and GPU memory Usage at the right bottom corner is printed as 384MiB. Even if this code makes huge amount of middle values, we use just 8MB for input data. I can’t understand why gpu uses over 380MiB to process.

Second questions is why the GPU Memory Usage value at right bottom corner does not change when the input data size is varied. I changed the variable N to 2<<30. But it was not changed to any values. Only the memory usage in the middle of picture changed.

– environment –
OS: Ubuntu 20.04
CUDA: 12.2
GPU: RTX 4090

Robert_Crovella · November 21, 2023, 1:48pm

Running a CUDA code on a GPU where the code itself requires 8MB of data space will certainly require more than 8MB of the GPU memory. This goes into various overheads. The first number is the total GPU memory in use. Roughly speaking, that consists of memory just to make the GPU active and able to accept a CUDA process, and also memory associated with the process. So the first number is all the memory in use, and the second number is the memory in use that is specifically associated with the process numbered 373723.

Most allocations have granularity. So one possible reason is that you have not exceeded a particular allocation granularity. That’s not likely the answer here. Another possibility is that the unified memory system may affect the reporting. Unified memory on linux does not necessarily allocate all expected device memory space at once.

hnts03 · November 21, 2023, 2:09pm

Thanks for reply.

Then, I have a question with your first/second reply each.

For first reply,
How to check the overheads and how to get the breakdown data of it? Could Nsight Systems and Nsight Compute find all of the overheads?

For second reply,
For the second possibility, is there any suggestion of “IDEAL” environment to perform and report(also profile) the CUDA kernels using Unified memory? Because I’m currently setting the container, if there are some solutions, I want to follow it.

Robert_Crovella · November 21, 2023, 2:18pm

I’m not aware of any accounting like that, available anywhere.

I’m not aware of anything about your setup that is non-“IDEAL”.

system · December 5, 2023, 2:19pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unified Memory: nvidia-smi "Memory Usage" interpretation CUDA Programming and Performance cuda	6	14567	June 27, 2023
Kernel maxing out GPU memory when it definitely should not be CUDA Programming and Performance	2	536	October 29, 2018
Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory? CUDA Programming and Performance cuda	6	566	June 8, 2023
Why does the GPU memory used by process not add up to memory used according to nvidia-smi? Video Processing & Optical Flow	2	1222	October 12, 2021
Explaining memory usage mismatch between nvidia-smi and Nsight System Profiling Linux Targets	2	498	March 26, 2025
Determine Memory CUDA Context Memory Usage CUDA Programming and Performance	16	10704	March 9, 2019
too much global memory occupication CUDA Programming and Performance	6	1064	February 5, 2020
Questions about nvidia-smi CUDA Programming and Performance	2	2060	February 23, 2011
Question about GPU Memory Overhead with Cudamallocmanaged CUDA Programming and Performance	7	1011	August 21, 2024
Bad performance when using unified memory CUDA Programming and Performance	2	3409	April 21, 2019

Memory usage values in nvidia-smi command

Related topics