Environment:
- Hardware: NVIDIA GH200 Grace Hopper Superchip (ARM64 / SBSA)
- Driver: 580.159.03
- OS: Ubuntu (ARM64)
- Workload: DeepStream / GStreamer-based video analytics pipeline
- Note: No official DeepStream .deb package exists for aarch64 SBSA. We extracted DeepStream from the official NVIDIA container image. We are uncertain whether this non-standard installation
method contributes to the memory issue.
Symptom:
During stress testing, we observed GPU memory climbing abnormally. Checking with nvidia-smi, the memory occupied does not match the sum of running processes — indicating memory is being held outside of any tracked process.
Further investigation showed that after restarting our application, DCGM_FI_DEV_FB_USED drops only partially. Across multiple restart cycles, the baseline gradually increases even when the channel count remains stable. A full system reboot is required to restore GPU memory to baseline.
This issue is not observed on machines with NVIDIA RTX 5070 (AMD64 / PCIe) running the same software version under comparable load.
Current Hypothesis:
We believe the root cause is the interaction between Linux Page Cache and GH200’s Unified Memory Architecture (NVLink-C2C):
- Continuous video I/O from the GStreamer pipeline fills the Linux Page Cache.
- Because GH200 shares a physical HBM pool between CPU and GPU, some of those cached pages end up resident in GPU HBM.
- DCGM_FI_DEV_FB_USED and host cached memory grow in tandem.
- Running sync && echo 3 > /proc/sys/vm/drop_caches immediately releases both — which supports this hypothesis.
Primary suspects: GStreamer (higher suspicion) and Triton Inference Server.
On PCIe-attached GPUs (RTX 5070), CPU Page Cache and GPU VRAM are physically separate, so the same workload shows no abnormal accumulation.
Questions:
- Is this behavior expected on GH200 unified memory platforms running heavy video I/O workloads?
- Could extracting DeepStream from a container (due to no official .deb for aarch64 SBSA) affect CUDA/UVM memory management behavior?
- Is there a driver-level or CUDA-level knob (e.g., cudaMemAdvise, UVM eviction hints) to prevent Page Cache pages from being pinned in GPU HBM?
- Are there newer driver releases (post 580.159.03) that address Page Cache residency behavior on GH200?
Workaround currently in use:
sync && echo 3 > /proc/sys/vm/drop_caches
Any guidance on a more permanent solution would be greatly appreciated.

