VSS FAQ

  1. First please read our Guide carefully to deploy and debug the VSS.
    https://docs.nvidia.com/vss/index.html

  2. You can first search the FAQ in our Guide to see if there are similar problem to yours.
    https://docs.nvidia.com/vss/content/faq.html

  3. You can search in this visual-ai-agent forum to see if there is a similar problem to yours.

1. [EA version] access issue
If you’re using the EA version and you’re having permission issues. Please check as follows first.

  1. You can check what Subscriptions you have under and whether your subscription is expired: https://org.ngc.nvidia.com/subscriptions
    There should be NVIDIA Developer Program and VSS Early Access
  2. Ensure you generate a key through: NGC User Guide - NVIDIA Docs

2. [Deployment] Use fewer GPU resources to deploy
If you want to deploy with less GPU resources than in our Guide, please configure the VSS using the Helm overrides file.

The following overrides files are used when you want to deploy the VSS with 4 A100(80G).
override.yaml (1.4 KB)

You can also refer to our Guide to replace the llama-3.1-70b-instruct to llama-3_1-8b-instruct to use fewer GPU resources.

3. [Deployment]Use the network proxy method to deploy the VSS
If your network is using a proxy, please refer to #34 to learn how to deoply the VSS in this scenario. The following overrides files are used when you are using network proxy.
override-proxy.yaml (2.6 KB)

4.[souce code] Customize the source code
If you want to customize our code, please go into the container and modify it yourself.
You can refe to this 326660 topic. This topic shows how to customize the UI. You can just customize the via_demo_client.py source code and related files in vss-engine container image.

5.[preliminary debug] If the deployment of some pods fails, how to perform a preliminary analysis

  • You can run the following command to check the logs of the pods.
sudo microk8s kubectl logs vss-vss-deployment-POD-NAME
sudo microk8s kubectl describe pod vss-vss-deployment-POD-NAME
  • You can check the problem by directly enter the pod with the following command
sudo microk8s.kubectl exec -it vss-vss-deployment-POD-NAME -- /bin/bash

You can analyze the problem according to the log info, or post the log to your topic.

6.[cuda error] “gpu-operator-resources nvidia-cuda-validator-<NAME>” pod failed
After you have enabled nvidia and hostpath-storage add-ons, the “nvidia-cuda-validator-<NAME>” pod startup failed. You need to check if your fabricmanager version is the exactly same as the driver version. If not, please reinstall the fabricmanger by following the command below.

driver_version=<xxx.xx.xx>
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager