I cannot login to DGX Spark; I cannot SSH into it from my macbook either

I cannot login to my DGX spark connected directly via kvm. I cannot SSH into it either. It just stays in the login screen. After logging in it just hangs on the login screen. This just started happening two nights ago while training some models. Please help!!!

Noel

It appears an OOM error occurred, causing the connection to fail due to insufficient memory. First, press the device’s power button to shut it down, then reboot it. Once logged in, immediately enter `docker ps` to check the running containers, then terminate the process.

1 Like

You were right! I managed to SSH into the spark long enough to stop the running containers. Is there anyway to prevent this from happening in the future?

When running containers in this manner, setting limits on memory and memory swap can prevent OOM.

docker run -d \
–name vllm \
–gpus all \
–memory=90g \
–memory-swap=90g \
–pids-limit=4096 \
–restart unless-stopped \
-p 8000:8000 \
-v /path/to/models:/models \
your-image:tag

2 Likes

Thanks so much!!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.