Driver 590.48.01 regression: UMA memory not released after CUDA process exit (works on 580.126.09)

Hardware: DGX Spark (GB10, 128GB unified memory)

Kernel: 6.14.0-1015-nvidia

Broken: Driver 590.48.01, CUDA 13.1

Working: Driver 580.126.09, CUDA 13.0

Description:

After a CUDA application exits (clean shutdown, all CUDA contexts destroyed, no zombie processes), approximately 80GB of system memory remains consumed. The memory does not appear in any standard Linux accounting category (not in AnonPages, Cached, Slab, or PageTables in /proc/meminfo), but MemAvailable drops correspondingly.

Could it be that this driver is incompatible with my kernel version?

The memory can be reclaimed by either:

  • sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

  • Unloading the NVIDIA kernel module (rmmod nvidia)

  • Rebooting the machine

Reverting to driver 580.126.09 resolves the issue, and memory is released immediately on process exit.

Reproduction steps:

  1. Install driver 590.48.01 on DGX Spark

  2. Run any CUDA application that allocates significant GPU memory (e.g., LLM inference with ~80GB KV cache)

  3. Exit the application cleanly (Ctrl+C, all destructors run)

  4. Observe cat /proc/meminfo | grep MemAvailable — memory remains consumed

  5. Confirm no processes are using GPU: nvidia-smi shows no processes

  6. sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' reclaims the memory

Comparison test: Same binary, same model, same workflow on driver 580.126.09, memory releases cleanly on process exit without any workaround.

Hello,

Thank you for this bug report and reproduction steps. Currently, we do not support new drivers past version 580.126.09. We will communicate this with engineering to ensure this is not an issue once we get there.