Please provide the following information when creating a topic:
Hardware Platform : dGPU, A100 x 8
System Memory : 2TB
Ubuntu Version : 22.04
NVIDIA GPU Driver Version (valid for GPU only) : 535.54.03
Issue Type( questions, new requirements, bugs) : bugs
How to reproduce the issue ? (This is for bugs. Including the command line used and other details for reproducing)
Requirement details (This is for new requirement. Including the logs for the pods, the description for the pods)
Hi, I have a problem with deploying VSS.
After following the Quickstart guide, I found that one of the pod keeps restarting(nemo-rerank-ranking-deployment)
The log that I attached points Cuda out of memory error. error_log_vss.txt (104.1 KB)
Error Code 1: Cuda Runtime (out of memory)
There is a similar issue with no updates so I report this issue.
Not sure this is the solution since it leads to the new issue.
But the CUDA out of memory error does not occur with this override file.
(force GPU allocation).
GPU status before starting VSS.
Nemo embedding keeps CrashLoopBackOff and failed to start vss deployment.
I attach the full log of nemo embedding(which keeps CrashLoopBackOff) and vss-deployment.
I see the error log in vss-deployment that says
GuardRails model load execution time = 2.563 sec
2025-02-11 01:58:40,941 ERROR Failed to load VIA stream handler - Guardrails failed
After applying the override file(force GPU allocation), like the image below(pod status), nemo embedding and vss-deployment keeps restarting.
Log is same as above.
tnx :)
OK. Let’s narrow down this problem by deploy the llama-3_2-nv-rerankqa-1b-v2 and llama-3_2-nv-embedqa-1b-v2 separately.
You can refer to our llama-3_2-nv-embedqa-1b-v2 and llama-3_2-nv-rerankqa-1b-v2 page to learn how to deploy that with docker.
First, I rebooted my server.
From the error log above, there was an error with guardrail setting.
So I disabled the guardrail(DISABLE_GUARDRAILS env as true) and found that all pods ran fine. override.txt (1.4 KB)
However, when trying to access UI with port 9000, I could not do so.
This is part of the vss-vss-deployment-POD-NAME log and a full log is attached. vss-deployment-log-port-error.txt (13.5 KB)
2025-02-13 08:09:49 | ERROR | stderr | INFO: Started server process [8365]
2025-02-13 08:09:49 | ERROR | stderr | INFO: Waiting for application startup.
2025-02-13 08:09:49 | ERROR | stderr | INFO: Application startup complete.
2025-02-13 08:09:49 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
2025-02-13 08:09:49 | INFO | stdout | INFO: 127.0.0.1:47104 - "GET / HTTP/1.1" 200 OK
***********************************************************
VIA Server loaded
Backend is running at http://0.0.0.0:8000
Frontend is running at http://0.0.0.0:9000
Press ctrl+C to stop
***********************************************************
Also, looking at my netstat, 8000 and 9000 are not found.