Second NIM container won't start due to less than desired GPU memory utilization

Hi, I ran one NIM container like this. Works ok, able to access it from Open WebUI via direct connection.

docker run -it --rm --name=nim-qwen3-dgx-spark --gpus all --shm-size=16GB -e CORS_ALLOWED_ORIGINS=“*” -e NGC_API_KEY=$NGC_API_KEY -e ACCE
PT_NVIDIA_TOS=1 -v “$HOME/.cache/nim:/opt/nim/.cache” -u $(id -u) -p 8000:8000 ``nvcr.io/nim/qwen/qwen4-32b-dgx-spark:latest

Now I want to run a second one like this:

docker run --name=nim-llama-3.1-dgx-spark --gpus all --shm-size=16GB -e GPU_MEMORY_UTILIZATION=0.4 -e CORS_ALLOWED_ORIGINS=“*” -e NGC_API_KEY=$NGC_API_KEY -e ACCEPT_NVIDIA_TOS=1 -v “$HOME/.cache/nim:/opt/nim/.cache” -u $(id -u) -p 9000:8000 ``nvcr.io/nim/meta/llama-3.1-8b-instruct-dgx-spark

But it won’t come up. Fails with this error:

(EngineCore_0 pid=194) ERROR 10-21 01:05:31 [core.py:700] raise ValueError(
(EngineCore_0 pid=194) ERROR 10-21 01:05:31 [core.py:700] ValueError: Free memory on device (49.37/119.7 GiB) on startup is less than desired GPU memory utilization (0.5, 59.85 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

On suggestion from Codex I tried -e GPU_MEMORY_UTILIZATION=0.3 but it didn’t help. DGX dashboard (shows 100GB free) and nvidia-smi (shows 59GB used) show different memory usage numbers. See screenshot Why is that?

Appreciate any other ideas on how to get this working. Thanks!

Hi,

The number on the DGX Dashboard shows the amount of memory being used, not how much is free. The nvidia-smi command is not always entirely accurate due to the unified memory architecture. Please reference our FAQ for more information.

Hi, the FAQ says to use free -h . But the memory in use shown by that command is 93GB but the DGX dashboard shows much lower usage.

Why is there a such a big difference? It would be good if the DGX dashboard is improved to show details and is more accurate.

1 Like

Hi,

This is a known issue that will be fixed in the next update. For now you can also look at /proc/meminfo at the MemAvailable section for the most accurate number.

1 Like

Issue fixed in previous update, marking as answered

It’s not fixed yet. DGX Dashboard is still showing inflated data for System memory

1 Like

You may want to clear cache and restart

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

1 Like

Sorry for the early answer, DGX Dashboard will be using /proc/meminfo after the next major update

1 Like

Any planned date for the update?

1 Like

We just released a software update today that addresses this issue, please make sure to update your systems

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.