Getting Error while running blueprint-vss demo

sudo microk8s helm fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.0.0.tgz --username=‘$oauthtoken’ --password=$NGC_API_KEY
Error: failed to fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.0.0.tgz : 403 Forbidden

also tried in ngc account i got below error

404 Error
Even AI can’t find this page!
If you’re logged in, this might be a permissions issue. Check with your Org or Team Admin. Otherwise, you can return to your previous screen or click the link below to explore the wide range of content from the NGC Catalog.

Can you please help me to resolve this error

Hello Basha,
Did you follow the instructions in the EA approval email regarding new NVIDIA Cloud Accounts?

“You will receive an invite titled ‘Welcome to NVIDIA NGC.’ Please accept the invitation and create a New NVIDIA Cloud Account for accessing the EA assets.”

1 Like

Hi @aryason ,
Thanks for you response,

Now i am getting helm chart access and able to install it but pods are not showing in ready state as shown in SS:

i am running this in 4 H100 GPUs single node.

Does is this the issue?
update: after waiting for sometime pods are not showing as running as below

There are two issues here:

  • The pods that are pending are all NV containers requiring GPU. The gpu operator may not be installed or is crashing because of incorrect driver. Can you run sudo microk8s kubectl get pod -A to check if gpu operator is installed and/or crashing?
  • The default helm chart is configured for 8 GPUs, which may be why some pods are remaining in the pending state if you did not modify the chart to use 4 GPUs.

Below are the details for sudo microk8s kubectl get pod -A

also i have changed the config in the overrides.yaml file as below.

vss:
applicationSpecs:
vss-deployment:
containers:
vss:
image:
repository: nvcr.io/nvidia/blueprint/vss-engine
tag: 2.0-ea # Update to override with custom VSS image
env:
- name: VLM_MODEL_TO_USE
value: vila-1.5 # Or “vila-1.5” or “custom”
# Specify path in case of VILA-1.5 and custom model. Can be either
# a NGC resource path or a local path. For custom models this
# must be a path to the directory containing “inference.py” and
# “manifest.yaml” files.
# For vila 1.5 model the value can be “ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8”
# Please make sure you have access to this model
- name: MODEL_PATH
value: “ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8”
- name: DISABLE_GUARDRAILS
value: “false” # “true” to disable guardrails.
- name: TRT_LLM_MODE
value: “” # int4_awq (default), int8 or fp16. (for VILA only)
- name: VLM_BATCH_SIZE
value: “” # Default is 16. (for VILA only)
- name: VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME
value: “” # Set to use a VLM exposed as a REST API with OpenAI compatible API (e.g. gpt-4o)
- name: VIA_VLM_ENDPOINT
value: “” # Default OpenAI API. Override to use a custom API
- name: VIA_VLM_API_KEY
value: “” # API key to set when calling VIA_VLM_ENDPOINT
- name: OPENAI_API_VERSION
value: “”
- name: AZURE_OPENAI_API_VERSION
value: “”

resources:
limits:
nvidia.com/gpu: 1 # Set to 8 for 2 x 8H100 node deployment

nodeSelector:

kubernetes.io/hostname:

nim-llm:
resources:
limits:
nvidia.com/gpu: 1

nodeSelector:

kubernetes.io/hostname:

nemo-embedding:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment

nodeSelector:

kubernetes.io/hostname:

nemo-rerank:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment

nodeSelector:

kubernetes.io/hostname:

Hi @aryason

Is the NVIDIA blueprint version released, if so, could we have a chance to access the latest version of VIA. We have send a request of early access AI blueprint, thanks.

Can you tell me what GPU driver you are using? If you are not using NVIDIA driver 535.161.08, could you install that version of the driver?

Thanks @aryason i have installed the 535.161.08 NVIDIA driver
still not sure why the installation is not completed please check the below details i have followed.
Installation check:
NVIDIA Drivers check:

sudo microk8s kubectl get pods -A

before installation
sudo microk8s helm list --all-namespaces

image

Secrets creation:
sudo microk8s kubectl get secrets

Install helm chart with default configuration.
sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.0.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret

Watching status.

Checking the status after 42 minutes still status is in pending.

Could you please help to resolve this.

Hi @basha.ghouse , could you try to run the kubectl describe pod vss-vss-deployment-7f485dcd8b-wxdkz to check the log of this pod?

Name: vss-vss-deployment-fd9df6bb6-slzn7
Namespace: default
Priority: 0
Service Account: default
Node: csctmp-xe8640-3/172.29.171.214
Start Time: Mon, 23 Dec 2024 19:01:27 +0000
Labels: app=vss-vss-deployment
app.kubernetes.io/instance=vss-blueprint
app.kubernetes.io/name=vss
generated_with=helm_builder
hb_version=1.0.0
microservice_version=0.0.1
msb_version=2.5.0
pod-template-hash=fd9df6bb6
Annotations: checksum/vss-configs-cm: 8a3bc5b52a74ba5abf15b4261a6e084ba8ca97e2742ca11d446ec09f6cdef4d5
checksum/vss-external-files-cm: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
checksum/vss-scripts-cm: 81ae92a369cf8a653a2cc7136469aebb35734cd7f678d13ba3b8a2bfe97f204a
checksum/vss-workload-cm: 08bd480e12de5c6d8e1e298b2769bd8457a92cd2b01881cd1271fc20d92256e8
cni.projectcalico.org/containerID: e0dd8928899141850b6e193b7c78eb97652a932a611970120782d5ba34aea065
cni.projectcalico.org/podIP: 10.1.253.244/32
cni.projectcalico.org/podIPs: 10.1.253.244/32
Status: Pending
IP: 10.1.253.244
IPs:
IP: 10.1.253.244
Controlled By: ReplicaSet/vss-vss-deployment-fd9df6bb6
Init Containers:
check-milvus-up:
Container ID: containerd://10f39b82018f3c13801e78e4dd3202b94dddb43a9d8c98b2cb8148716555c7ce
Image: busybox:1.28
Image ID: Docker Hub Container Image Library | App Containerization
Port:
Host Port:
Command:
sh
-c
until nc -z -w 2 milvus-milvus-deployment-milvus-service 19530; do echo waiting for milvus; sleep 2; done
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 23 Dec 2024 19:01:28 +0000
Finished: Mon, 23 Dec 2024 19:01:28 +0000
Ready: True
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8d8mx (ro)
check-neo4j-up:
Container ID: containerd://dc225cf0d72010657939a1bc9734ad9bf861dceab74e6b4c5148cc1ebd791420
Image: busybox:1.28
Image ID: Docker Hub Container Image Library | App Containerization
Port:
Host Port:
Command:
sh
-c
until nc -z -w 2 neo-4-j-service 7687; do echo waiting for neo4j; sleep 2; done
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 23 Dec 2024 19:01:29 +0000
Finished: Mon, 23 Dec 2024 19:01:29 +0000
Ready: True
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8d8mx (ro)
check-llm-up:
Container ID: containerd://e8ddf6d8d4dd32584f2ffe6a5999d460997fde4b078d392721807a0f3cd55ec8
Image: curlimages/curl:latest
Image ID: docker.io/curlimages/curl@sha256:c1fe1679c34d9784c1b0d1e5f62ac0a79fca01fb6377cdd33e90473c6f9f9a69
Port:
Host Port:
Command:
sh
-c
Args:
while ! curl -s -f -o /dev/null http://llm-nim-svc:8000/v1/health/live; do
echo “Waiting for LLM…”
sleep 2
done

State:          Running
  Started:      Mon, 23 Dec 2024 19:01:31 +0000
Ready:          False
Restart Count:  0
Limits:
  nvidia.com/gpu:  1
Requests:
  nvidia.com/gpu:  1
Environment:       <none>
Mounts:
  /opt/configs from configs-volume (rw)
  /opt/scripts from scripts-cm-volume (rw)
  /opt/workload-config from workload-cm-volume (rw)
  /secrets/graph-db-password from secret-graph-db-password-volume (ro,path="graph-db-password")
  /secrets/graph-db-username from secret-graph-db-username-volume (ro,path="graph-db-username")
  /secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path="ngc-api-key")
  /secrets/openai-api-key from secret-openai-api-key-volume (ro,path="openai-api-key")
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8d8mx (ro)

Containers:
vss:
Container ID:
Image: nvcr.io/nvidia/blueprint/vss-engine:2.0-ea
Image ID:
Port: 8000/TCP
Host Port: 0/TCP
Command:
bash
/opt/scripts/start.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Liveness: http-get http://:http-api/health/live delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-api/health/ready delay=5s timeout=1s period=5s #success=1 #failure=3
Startup: http-get http://:http-api/health/ready delay=0s timeout=1s period=10s #success=1 #failure=180
Environment:
VLM_MODEL_TO_USE: openai-compat
MODEL_PATH:
DISABLE_GUARDRAILS: false
TRT_LLM_MODE:
VLM_BATCH_SIZE:
VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME:
VIA_VLM_ENDPOINT:
VIA_VLM_API_KEY:
OPENAI_API_VERSION:
AZURE_OPENAI_API_VERSION:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/tmp/via-ngc-model-cache from ngc-model-cache-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8d8mx (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
ngc-model-cache-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: vss-ngc-model-cache-pvc
ReadOnly: false
workload-cm-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-workload-cm
Optional: false
configs-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-configs-cm
Optional: false
scripts-cm-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-scripts-cm
Optional: false
secret-openai-api-key-volume:
Type: Secret (a volume populated by a Secret)
SecretName: openai-api-key-secret
Optional: false
secret-ngc-api-key-volume:
Type: Secret (a volume populated by a Secret)
SecretName: ngc-api-key-secret
Optional: false
secret-graph-db-username-volume:
Type: Secret (a volume populated by a Secret)
SecretName: graph-db-creds-secret
Optional: false
secret-graph-db-password-volume:
Type: Secret (a volume populated by a Secret)
SecretName: graph-db-creds-secret
Optional: false
kube-api-access-8d8mx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:

It seems to be waiting for nvcr.io/nvidia/blueprint/vss-engine:2.0-ea

Yes, that’s what it looks like. Could you run the sudo microk8s kubectl logs vss-vss-deployment-POD-NAME to check the log of the image pod? Also could you attach the gpu memory and cpu memory of your device?

NAME READY STATUS RESTARTS AGE
etcd-etcd-deployment-6745896d58-nz5c4 1/1 Running 0 4m39s
milvus-milvus-deployment-7bfd5c795b-r9f6c 1/1 Running 0 4m39s
minio-minio-deployment-7cf966bb89-c4jpx 1/1 Running 0 4m39s
nemo-embedding-embedding-deployment-689d64765-tbd5q 0/1 ImagePullBackOff 0 4m39s
nemo-rerank-ranking-deployment-865fdd9c67-w76lz 0/1 ImagePullBackOff 0 4m39s
neo4j-neo4j-deployment-5cdf686bcb-96vxp 1/1 Running 0 4m39s
vss-blueprint-0 1/1 Running 0 4m39s
vss-vss-deployment-55fb8cf6d8-p6t9x 0/1 Pending 0 4m38s

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 1 N/A N/A 91866 C /opt/nim/llm/.venv/bin/python3 77712MiB |
| 2 N/A N/A 91867 C /opt/nim/llm/.venv/bin/python3 77718MiB |
±--------------------------------------------------------------------------------------+

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                        

91866 centific 20 0 308.1g 41.9g 2.6g S 0.0 4.2 1:11.37 pt_main_thread
91867 centific 20 0 301.1g 41.9g 2.6g R 200.3 4.2 9:28.43 pt_main_thread
90560 7474 20 0 52.1g 1.5g 22460 S 0.3 0.1 0:25.71 java
3359 root 20 0 2300708 954716 97888 S 2.7 0.1 17:09.27 kubelite
2058 root 20 0 2020812 594992 20036 S 1.0 0.1 1:30.55 k8s-dqlite
89304 root 20 0 5197820 278344 150760 S 1.0 0.0 0:13.19 milvus
89234 root 20 0 3736424 184792 48068 S 0.0 0.0 0:00.82 minio
23310 root 20 0 3670996 172808 30052 S 0.0 0.0 0:07.21 dcgm-exporter
1452 root 19 -1 173580 94912 92776 S 0.0 0.0 0:01.07 systemd-journal
2261 root 20 0 3688208 76988 54664 S 0.0 0.0 0:00.42 dockerd

kubectl logs vss-vss-deployment-55fb8cf6d8-p6t9x
Defaulted container “vss” out of: vss, check-milvus-up (init), check-neo4j-up (init), check-llm-up (init)

Name: vss-vss-deployment-55fb8cf6d8-p6t9x
Namespace: default
Priority: 0
Service Account: default
Node:
Labels: app=vss-vss-deployment
app.kubernetes.io/instance=vss-blueprint
app.kubernetes.io/name=vss
generated_with=helm_builder
hb_version=1.0.0
microservice_version=0.0.1
msb_version=2.5.0
pod-template-hash=55fb8cf6d8
Annotations: checksum/vss-configs-cm: 8a3bc5b52a74ba5abf15b4261a6e084ba8ca97e2742ca11d446ec09f6cdef4d5
checksum/vss-external-files-cm: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
checksum/vss-scripts-cm: 81ae92a369cf8a653a2cc7136469aebb35734cd7f678d13ba3b8a2bfe97f204a
checksum/vss-workload-cm: 08bd480e12de5c6d8e1e298b2769bd8457a92cd2b01881cd1271fc20d92256e8
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/vss-vss-deployment-55fb8cf6d8
Init Containers:
check-milvus-up:
Image: busybox:1.28
Port:
Host Port:
Command:
sh
-c
until nc -z -w 2 milvus-milvus-deployment-milvus-service 19530; do echo waiting for milvus; sleep 2; done
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-97g2h (ro)
check-neo4j-up:
Image: busybox:1.28
Port:
Host Port:
Command:
sh
-c
until nc -z -w 2 neo-4-j-service 7687; do echo waiting for neo4j; sleep 2; done
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-97g2h (ro)
check-llm-up:
Image: curlimages/curl:latest
Port:
Host Port:
Command:
sh
-c
Args:
while ! curl -s -f -o /dev/null http://llm-nim-svc:8000/v1/health/live; do
echo “Waiting for LLM…”
sleep 2
done

Limits:
  nvidia.com/gpu:  1
Requests:
  nvidia.com/gpu:  1
Environment:       <none>
Mounts:
  /opt/configs from configs-volume (rw)
  /opt/scripts from scripts-cm-volume (rw)
  /opt/workload-config from workload-cm-volume (rw)
  /secrets/graph-db-password from secret-graph-db-password-volume (ro,path="graph-db-password")
  /secrets/graph-db-username from secret-graph-db-username-volume (ro,path="graph-db-username")
  /secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path="ngc-api-key")
  /secrets/openai-api-key from secret-openai-api-key-volume (ro,path="openai-api-key")
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-97g2h (ro)

Containers:
vss:
Image: nvcr.io/nvidia/blueprint/vss-engine:2.0-ea
Port: 8000/TCP
Host Port: 0/TCP
Command:
bash
/opt/scripts/start.sh
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Liveness: http-get http://:http-api/health/live delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-api/health/ready delay=5s timeout=1s period=5s #success=1 #failure=3
Startup: http-get http://:http-api/health/ready delay=0s timeout=1s period=10s #success=1 #failure=180
Environment:
VLM_MODEL_TO_USE: vila-1.5
MODEL_PATH:
DISABLE_GUARDRAILS: false
TRT_LLM_MODE:
VLM_BATCH_SIZE:
VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME:
VIA_VLM_ENDPOINT:
VIA_VLM_API_KEY:
OPENAI_API_VERSION:
AZURE_OPENAI_API_VERSION:
Mounts:
/opt/configs from configs-volume (rw)
/opt/scripts from scripts-cm-volume (rw)
/opt/workload-config from workload-cm-volume (rw)
/secrets/graph-db-password from secret-graph-db-password-volume (ro,path=“graph-db-password”)
/secrets/graph-db-username from secret-graph-db-username-volume (ro,path=“graph-db-username”)
/secrets/ngc-api-key from secret-ngc-api-key-volume (ro,path=“ngc-api-key”)
/secrets/openai-api-key from secret-openai-api-key-volume (ro,path=“openai-api-key”)
/tmp/via-ngc-model-cache from ngc-model-cache-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-97g2h (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
ngc-model-cache-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: vss-ngc-model-cache-pvc
ReadOnly: false
workload-cm-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-workload-cm
Optional: false
configs-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-configs-cm
Optional: false
scripts-cm-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vss-scripts-cm
Optional: false
secret-openai-api-key-volume:
Type: Secret (a volume populated by a Secret)
SecretName: openai-api-key-secret
Optional: false
secret-ngc-api-key-volume:
Type: Secret (a volume populated by a Secret)
SecretName: ngc-api-key-secret
Optional: false
secret-graph-db-username-volume:
Type: Secret (a volume populated by a Secret)
SecretName: graph-db-creds-secret
Optional: false
secret-graph-db-password-volume:
Type: Secret (a volume populated by a Secret)
SecretName: graph-db-creds-secret
Optional: false
kube-api-access-97g2h:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 71s default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

@kelvin.lwin could you attach the model of gpu you are using, the number of gpus, the memory of your gpu, the memory of your system and the driver version first?

  • GPU model:
  • number of your GPUs:
  • memory of GPU:
  • memory of your system:
  • driver version:
    It is possible that your env does not meet the following requirements supported-platforms, so that the model file cannot be converted successfully.

Then you can save the description, logs and pods in the txt file and attach that.

sudo microk8s kubectl get pod -A > all_pod.txt
sudo microk8s kubectl logs <vss-vss-deployment-POD-NAME>  > vss_log.txt
sudo microk8s kubectl describe pod <vss-vss-deployment-POD-NAME>   > vss_describe.txt

4X H100 (Does this need to be 8X?)
GPU 80GB
System 1TB
Driver Version: 535.161.08
vss_describe.txt (9.3 KB)
vss_log.txt (236 Bytes)
all_pod.txt (3.2 KB)

±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:4E:00.0 Off | 0 |
| N/A 36C P0 71W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:5F:00.0 Off | 0 |
| N/A 35C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:CB:00.0 Off | 0 |
| N/A 36C P0 73W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:DB:00.0 Off | 0 |
| N/A 36C P0 69W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+

default                  nemo-embedding-embedding-deployment-689d64765-b7pgn           0/1     ImagePullBackOff   0              6m5s
default                  nemo-rerank-ranking-deployment-865fdd9c67-jcmjp               0/1     ErrImagePull       0              6m5s

From the all_pod.txt, the nemo ralted pod failed to start properly on your side. Maybe there are no enough resources on your gpu. Could you just remove that from the chart yaml file and try again?
You can also get the log from these 2 pods by referring to the below.

sudo microk8s kubectl logs <nemo-embedding-POD-NAME>  > nemo_embedding_log.txt
sudo microk8s kubectl describe pod <nemo_embedding-POD-NAME>   > nemo_embedding_describe.txt
sudo microk8s kubectl logs <nemo-rerank-POD-NAME>  > nemo-rerank_log.txt
sudo microk8s kubectl describe pod <nemo-rerank-POD-NAME>   > nemo-rerank_describe.txt

I’ve tried a few set of API keys (thinking it’s permission issue) and now the images aren’t pulling for those Nemo and also blueprint itself.

I deleted the helm chart release. Edited Chart.yaml to delete the 2 nemos, created a new tgz file. However it’s still trying to pull the nemos and facing the same error issue as above.

Could you refer to our Guide below to use the remote nemo?
remote-nemo-rerank-and-embedding-endpoint

I found based on your document that 2.1.0 is ready so tried it by downloading it and these are the logs for Nemo I got. They’re showing CrashLoopBackOff after 33 restarts. Logs of those are included here

nemo-rerank_log.txt (5.4 KB)
nemo-rerank_describe.txt (3.7 KB)
nemo_embedding_log.txt (4.5 KB)
nemo_embedding_describe.txt (3.7 KB)

Now I’ll try to remove them from this version and see we get any further.

Successfully removed the 2 Nemo and here are the files from that run using version 2.1

vss_log_2_1.txt (236 Bytes)
vss_describe_2_1.txt (9.5 KB)
all_pod_2_1.txt (2.7 KB)

maybe most relevant is this error?
Defaulted container “vss” out of: vss, check-milvus-up (init), check-neo4j-up (init), check-llm-up (init)

Error from server (BadRequest): container “vss” in pod “vss-vss-deployment-68f96d9ff9-nwjv7” is waiting to start: PodInitializing

Yes. This problem is most likely caused by insufficient GPU resources. We encountered similar problems when we tried to deploy that with less than 8 GPUs.
Could you try deploying it on a device with 8 GPUs, each GPU with 80G memory first?