Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment

Please provide the following information when creating a topic:

  • Hardware Platform : dGPU, A100 x 8
  • System Memory : 2TB
  • Ubuntu Version : 22.04
  • NVIDIA GPU Driver Version (valid for GPU only) : 535.54.03
  • Issue Type( questions, new requirements, bugs) : bugs
  • How to reproduce the issue ? (This is for bugs. Including the command line used and other details for reproducing)
  • Requirement details (This is for new requirement. Including the logs for the pods, the description for the pods)

Hi, I have a problem with deploying VSS.
After following the Quickstart guide, I found that one of the pod keeps restarting(nemo-rerank-ranking-deployment)
The log that I attached points Cuda out of memory error.
error_log_vss.txt (104.1 KB)

Error Code 1: Cuda Runtime (out of memory)

There is a similar issue with no updates so I report this issue.

Have you made any changes to the configuration file?

1 Like

No i haven’t made any changes.
Just followed the steps in the docs.

# Create the NGC image pull secret

sudo microk8s kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY

# Create the neo4j db credentials secret

sudo microk8s kubectl create secret generic graph-db-creds-secret --from-literal=username=neo4j --from-literal=password=password

# Create NGC Secret

sudo microk8s kubectl create secret generic ngc-api-key-secret --from-literal=NGC_API_KEY=$NGC_API_KEY
# Fetch the VSS Blueprint Helm Chart

sudo microk8s helm fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.1.0.tgz --username='$oauthtoken' --password=$NGC_API_KEY

# Install the Helm Chart

sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.1.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret

Not sure this is the solution since it leads to the new issue.
But the CUDA out of memory error does not occur with this override file.
(force GPU allocation).

nim-llm:
  env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: "0,1,2,3"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: VLM_MODEL_TO_USE
            value: vila-1.5 
          - name: MODEL_PATH
            value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
          - name: NVIDIA_VISIBLE_DEVICES
            value: "4,5"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
 
nemo-embedding:
  applicationSpecs:
    embedding-deployment:
      containers:
        embedding-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '6'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
 
nemo-rerank:
  applicationSpecs:
    ranking-deployment:
      containers:
        ranking-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '7'
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit

I attach the log with allocation. LGTM :)
For the additional issue, I created the new issue here.

That’s how we allocate our GPU resources by default, so there should be no problem in theory. default-deployment-topology-and-models-in-use.

Could you attach the RAM of your device? You can attach the result of the “top” command.

Here is the result with top and htop. the system mem is 2TB


image

OK. We’ll take a further analysis at both of your problems.

1 Like

Hi @young2theMax , do you have other apps running on your device besides VSS? You can try to delete the VSS deployment and check the GPU memory.


GPU status before starting VSS.
Nemo embedding keeps CrashLoopBackOff and failed to start vss deployment.
I attach the full log of nemo embedding(which keeps CrashLoopBackOff) and vss-deployment.
I see the error log in vss-deployment that says

GuardRails model load execution time = 2.563 sec
2025-02-11 01:58:40,941 ERROR Failed to load VIA stream handler - Guardrails failed

After applying the override file(force GPU allocation), like the image below(pod status), nemo embedding and vss-deployment keeps restarting.
Log is same as above.
tnx :)


vss_deployment_log.txt (148.5 KB)
nemo_emb_log.txt (33.8 KB)

OK. Let’s narrow down this problem by deploy the llama-3_2-nv-rerankqa-1b-v2 and llama-3_2-nv-embedqa-1b-v2 separately.
You can refer to our llama-3_2-nv-embedqa-1b-v2 and llama-3_2-nv-rerankqa-1b-v2 page to learn how to deploy that with docker.

Okay :)
This is how I deploy llama-3_2-nv-rerankqa-1b-v2 and llama-3_2-nv-embedqa-1b-v2 separately.

docker run -it --rm \
    --gpus "device=7" \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 9000:8000 \
    --name emb_nemo \
    nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:1.3.0
docker run -it --rm \
    --gpus "device=6" \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    --name rerank_nemo \
    nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0

I also attach the log for both containers which run fine.
separate_nemo_rerank.txt (108.1 KB)
separate_nemo_emb.txt (37.2 KB)

From the log you attached, have you obtain-ngc-api-key?

You need to set the NGC_API_KEY first like below.

export NGC_API_KEY=<your_ngc_api_key>

Yes i got my ngc api key and set with following command.

sudo microk8s kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY

First, I rebooted my server.
From the error log above, there was an error with guardrail setting.
So I disabled the guardrail(DISABLE_GUARDRAILS env as true) and found that all pods ran fine.
override.txt (1.4 KB)

However, when trying to access UI with port 9000, I could not do so.
This is part of the vss-vss-deployment-POD-NAME log and a full log is attached.
vss-deployment-log-port-error.txt (13.5 KB)

2025-02-13 08:09:49 | ERROR | stderr | INFO:     Started server process [8365]
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Waiting for application startup.
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Application startup complete.
2025-02-13 08:09:49 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
2025-02-13 08:09:49 | INFO | stdout | INFO:     127.0.0.1:47104 - "GET / HTTP/1.1" 200 OK
***********************************************************
VIA Server loaded
Backend is running at http://0.0.0.0:8000
Frontend is running at http://0.0.0.0:9000
Press ctrl+C to stop
***********************************************************

Also, looking at my netstat, 8000 and 9000 are not found.

You need to set the ngc-api-key to the env like below first.

export NGC_API_KEY=<your_ngc_api_key>

Please refer to our launch-vss-ui to get the port number.

1 Like

Yeah, my bad :)
Without guardrail but it runs fine now.
Thanks a lot for your help :)

To summarize history

  • CUDA out of memory error
  • Pods keep restarting(nemo embedding / rerank)
    -----> Force GPU allocation
  • Failed to load VIA stream handler - Guardrails failed
    -----> disable guardrail
1 Like