GH200: GPU HBM memory not released after restart — Page Cache accumulation via Unified Memory?

yingchu.chen · June 5, 2026, 8:23am

Environment:

Hardware: NVIDIA GH200 Grace Hopper Superchip (ARM64 / SBSA)
Driver: 580.159.03
OS: Ubuntu (ARM64)
Workload: DeepStream / GStreamer-based video analytics pipeline
Note: No official DeepStream .deb package exists for aarch64 SBSA. We extracted DeepStream from the official NVIDIA container image. We are uncertain whether this non-standard installation
method contributes to the memory issue.

Symptom:
During stress testing, we observed GPU memory climbing abnormally. Checking with nvidia-smi, the memory occupied does not match the sum of running processes — indicating memory is being held outside of any tracked process.

Further investigation showed that after restarting our application, DCGM_FI_DEV_FB_USED drops only partially. Across multiple restart cycles, the baseline gradually increases even when the channel count remains stable. A full system reboot is required to restore GPU memory to baseline.

This issue is not observed on machines with NVIDIA RTX 5070 (AMD64 / PCIe) running the same software version under comparable load.

Current Hypothesis:

We believe the root cause is the interaction between Linux Page Cache and GH200’s Unified Memory Architecture (NVLink-C2C):

Continuous video I/O from the GStreamer pipeline fills the Linux Page Cache.
Because GH200 shares a physical HBM pool between CPU and GPU, some of those cached pages end up resident in GPU HBM.
DCGM_FI_DEV_FB_USED and host cached memory grow in tandem.
Running sync && echo 3 > /proc/sys/vm/drop_caches immediately releases both — which supports this hypothesis.

image-20260604-0355431920×853 179 KB

image-20260604-035055682×345 15 KB

Primary suspects: GStreamer (higher suspicion) and Triton Inference Server.

On PCIe-attached GPUs (RTX 5070), CPU Page Cache and GPU VRAM are physically separate, so the same workload shows no abnormal accumulation.

Questions:

Is this behavior expected on GH200 unified memory platforms running heavy video I/O workloads?
Could extracting DeepStream from a container (due to no official .deb for aarch64 SBSA) affect CUDA/UVM memory management behavior?
Is there a driver-level or CUDA-level knob (e.g., cudaMemAdvise, UVM eviction hints) to prevent Page Cache pages from being pinned in GPU HBM?
Are there newer driver releases (post 580.159.03) that address Page Cache residency behavior on GH200?

Workaround currently in use:
sync && echo 3 > /proc/sys/vm/drop_caches

Any guidance on a more permanent solution would be greatly appreciated.

Topic		Replies	Views
GH200 memory not clearing CUDA Programming and Performance cuda	4	604	May 1, 2025
Driver 590.48.01 regression: UMA memory not released after CUDA process exit (works on 580.126.09) DGX Spark / GB10	1	327	February 11, 2026
[575.64] NVRM Out of memory error causes dGPU to not be usable after some time Linux	22	2818	August 23, 2025
vLLM v0.8.4 shows UVM GPU1 BH process with high utilization CUDA Programming and Performance	6	719	April 25, 2025
CUDA multiple gpus page-locked memory malloc and free CUDA Programming and Performance cuda	0	345	August 14, 2020
NVIDIA-SMI shows 111M gpu memory used, after cudaFreeHost release the memory created by cudaHostAlloc CUDA Programming and Performance	1	518	January 5, 2019
Unnecessary HtoD page migration overhead on write when using Unified Memory CUDA Programming and Performance	1	512	February 15, 2019
Unified memory -gpu=unified nvc, nvc++ and nvfortran	3	1028	March 26, 2024
How to free gpu pages in unified memory so that subsequent cudaMalloc can use more memory? CUDA Programming and Performance	0	85	July 7, 2025
Facing cuda memory issue CUDA-MEMCHECK cuda , gstreamer	2	1291	January 17, 2021

GH200: GPU HBM memory not released after restart — Page Cache accumulation via Unified Memory?

This issue is not observed on machines with NVIDIA RTX 5070 (AMD64 / PCIe) running the same software version under comparable load.

Related topics