Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment

young2theMax · February 4, 2025, 9:47am

Please provide the following information when creating a topic:

Hardware Platform : dGPU, A100 x 8
System Memory : 2TB
Ubuntu Version : 22.04
NVIDIA GPU Driver Version (valid for GPU only) : 535.54.03
Issue Type( questions, new requirements, bugs) : bugs
How to reproduce the issue ? (This is for bugs. Including the command line used and other details for reproducing)
Requirement details (This is for new requirement. Including the logs for the pods, the description for the pods)

Hi, I have a problem with deploying VSS.
After following the Quickstart guide, I found that one of the pod keeps restarting(nemo-rerank-ranking-deployment)
The log that I attached points Cuda out of memory error.
error_log_vss.txt (104.1 KB)

Error Code 1: Cuda Runtime (out of memory)

There is a similar issue with no updates so I report this issue.

yuweiw · February 5, 2025, 2:33am

Have you made any changes to the configuration file?

young2theMax · February 5, 2025, 6:34am

No i haven’t made any changes.
Just followed the steps in the docs.

# Create the NGC image pull secret

sudo microk8s kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY

# Create the neo4j db credentials secret

sudo microk8s kubectl create secret generic graph-db-creds-secret --from-literal=username=neo4j --from-literal=password=password

# Create NGC Secret

sudo microk8s kubectl create secret generic ngc-api-key-secret --from-literal=NGC_API_KEY=$NGC_API_KEY
# Fetch the VSS Blueprint Helm Chart

sudo microk8s helm fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.1.0.tgz --username='$oauthtoken' --password=$NGC_API_KEY

# Install the Helm Chart

sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.1.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret

young2theMax · February 5, 2025, 8:24am

Not sure this is the solution since it leads to the new issue.
But the CUDA out of memory error does not occur with this override file.
(force GPU allocation).

nim-llm:
  env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: "0,1,2,3"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5 
          - name: MODEL_PATH
            value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
          - name: NVIDIA_VISIBLE_DEVICES
            value: "4,5"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
 
nemo-embedding:
  applicationSpecs:
    embedding-deployment:
      containers:
        embedding-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '6'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
nemo-rerank:
  applicationSpecs:
    ranking-deployment:
      containers:
        ranking-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '7'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit

I attach the log with allocation. LGTM :)
For the additional issue, I created the new issue here.

yuweiw · February 5, 2025, 9:00am

That’s how we allocate our GPU resources by default, so there should be no problem in theory. default-deployment-topology-and-models-in-use.

Could you attach the RAM of your device? You can attach the result of the “top” command.

young2theMax · February 5, 2025, 9:02am

Here is the result with top and htop. the system mem is 2TB

yuweiw · February 6, 2025, 1:54am

OK. We’ll take a further analysis at both of your problems.

yuweiw · February 10, 2025, 10:29am

Hi @young2theMax , do you have other apps running on your device besides VSS? You can try to delete the VSS deployment and check the GPU memory.

young2theMax · February 11, 2025, 1:59am

GPU status before starting VSS.
Nemo embedding keeps CrashLoopBackOff and failed to start vss deployment.
I attach the full log of nemo embedding(which keeps CrashLoopBackOff) and vss-deployment.
I see the error log in vss-deployment that says

GuardRails model load execution time = 2.563 sec
2025-02-11 01:58:40,941 ERROR Failed to load VIA stream handler - Guardrails failed

After applying the override file(force GPU allocation), like the image below(pod status), nemo embedding and vss-deployment keeps restarting.
Log is same as above.
tnx :)

vss_deployment_log.txt (148.5 KB)
nemo_emb_log.txt (33.8 KB)

yuweiw · February 13, 2025, 1:37am

OK. Let’s narrow down this problem by deploy the llama-3_2-nv-rerankqa-1b-v2 and llama-3_2-nv-embedqa-1b-v2 separately.
You can refer to our llama-3_2-nv-embedqa-1b-v2 and llama-3_2-nv-rerankqa-1b-v2 page to learn how to deploy that with docker.

young2theMax · February 13, 2025, 6:14am

Okay :)
This is how I deploy llama-3_2-nv-rerankqa-1b-v2 and llama-3_2-nv-embedqa-1b-v2 separately.

docker run -it --rm \
    --gpus "device=7" \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 9000:8000 \
    --name emb_nemo \
    nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.3.0

docker run -it --rm \
    --gpus "device=6" \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    --name rerank_nemo \
    nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0

I also attach the log for both containers which run fine.
separate_nemo_rerank.txt (108.1 KB)
separate_nemo_emb.txt (37.2 KB)

yuweiw · February 13, 2025, 6:59am

From the log you attached, have you obtain-ngc-api-key?

You need to set the NGC_API_KEY first like below.

export NGC_API_KEY=<your_ngc_api_key>

young2theMax · February 13, 2025, 8:13am

Yes i got my ngc api key and set with following command.

sudo microk8s kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY

First, I rebooted my server.
From the error log above, there was an error with guardrail setting.
So I disabled the guardrail(DISABLE_GUARDRAILS env as true) and found that all pods ran fine.
override.txt (1.4 KB)

However, when trying to access UI with port 9000, I could not do so.
This is part of the vss-vss-deployment-POD-NAME log and a full log is attached.
vss-deployment-log-port-error.txt (13.5 KB)

2025-02-13 08:09:49 | ERROR | stderr | INFO:     Started server process [8365]
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Waiting for application startup.
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Application startup complete.
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
2025-02-13 08:09:49 | INFO | stdout | INFO:     127.0.0.1:47104 - "GET / HTTP/1.1" 200 OK
***********************************************************
VIA Server loaded
Backend is running at http://0.0.0.0:8000
Frontend is running at http://0.0.0.0:9000
Press ctrl+C to stop
***********************************************************

Also, looking at my netstat, 8000 and 9000 are not found.

yuweiw · February 13, 2025, 10:16am

You need to set the ngc-api-key to the env like below first.

export NGC_API_KEY=<your_ngc_api_key>

Please refer to our launch-vss-ui to get the port number.

young2theMax · February 13, 2025, 10:28am

Yeah, my bad :)
Without guardrail but it runs fine now.
Thanks a lot for your help :)

To summarize history

CUDA out of memory error
Pods keep restarting(nemo embedding / rerank)
-----> Force GPU allocation
Failed to load VIA stream handler - Guardrails failed
-----> disable guardrail

system · February 27, 2025, 10:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Warning Unhealthy kubelet Startup probe failed: Get "v1/health/ready": dial tcp 10.1.124.81:8000: connect: connection refused Visual AI Agent nvbugs , nim , llama	31	183	April 14, 2025
VSS Installation problem Visual AI Agent	11	98	February 21, 2025
VSS issue - vss-blueprint-0 keeps restarting Visual AI Agent nvbugs	4	69	February 13, 2025
VSS Blueprint Helm Installation- Nemo embedding pod failure Visual AI Agent nim , llama	6	16	May 23, 2025
Error running Nvidia VSS blueprint \|\| pods kept restating and crashing multiple times and never completed Visual AI Agent nim , llama	10	171	March 5, 2025
Cuda failure in Deepstream docker on Centos 7 DeepStream SDK	11	1229	October 12, 2021
VSS Deployment - "vss-blueprint-0" Pod Keeps Crashing NGC GPU Cloud nim , llama-31-70b-instruct , llama , blueprints	0	36	February 2, 2025
VSS issue - API Key Issue When Using OpenAI GPT-4o Instead of LLM-SVC in VSS Blueprint Visual AI Agent nvbugs , kubernetes , ngc , nim , llama-31-70b-instruct , nvidia-technologies , llama , blueprints	7	57	March 4, 2025
VSS Installation Visual AI Agent	14	143	February 14, 2025
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU CUDA Programming and Performance	9	2557	January 31, 2019

Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment

Related topics