Static GPU Memory Usage on NIM Server

fmalik-external · December 5, 2024, 1:24pm

When I perform inference to my self-hosted NIM Server using “nvcr.io/nim/nvidia/nv-rerankqa-mistral-4b-v3” as the base image, the GPU memory usage is somewhat static (I think it’s the size of the model), while I have different size of passages for the inference and the GPU Compute usage is fluctuate like shown on the images. I capture the GPU usage using nvtop and when I check using nvidia-smi command, the GPU Memory usage also the same. There is no issue what so ever in the service itself, but I just wondering if this behavior is expected or not. Because as far as I know, the input data should be loaded to the GPU memory in order to be processed by the GPU. I’m particularly interested in this details so that I can optimize the inference more.

calexiuk · December 13, 2024, 5:38am

Yes, this is expected behaviour. As sequences are processed through the model they have an impact on GPU utilization.

There will be a set amount reserved for the model itself, and then varying amounts based on the sequences, batches, etc.

Topic		Replies	Views
Why does the GPU memory used by process not add up to memory used according to nvidia-smi? Video Processing & Optical Flow	2	1200	October 12, 2021
GPU utilization DGX User Forum	8	6356	August 21, 2019
TensortRT Memory Utilization TensorRT	1	364	August 19, 2020
Some questions on GPU utilization CUDA Programming and Performance	5	3903	October 8, 2021
Why is ~300 MiB of GPU RAM used by "nothing"? CUDA Programming and Performance	8	1665	February 22, 2018
Mismatch of GPU utilization between Nsight systems profile and nvidia-smi Profiling Linux Targets cuda , nsight , pytorch	0	742	June 1, 2022
Discrepancy when profiling GPU memory utilization CUDA Programming and Performance	0	528	December 4, 2018
GPU Memory Stats best practices General Topics and Other SDKs opengl , nvidia-smi	0	32	August 7, 2024
Questions about nvidia-smi CUDA Programming and Performance	2	2039	February 23, 2011
GPU Memory Usage shows "N/A" CUDA Setup and Installation	15	32391	May 22, 2024

Static GPU Memory Usage on NIM Server

Related topics