VSS FAQ

  1. First please read our Guide carefully to deploy and debug the VSS.
    https://docs.nvidia.com/vss/index.html

  2. You can first search the FAQ in our Guide to see if there are similar problem to yours.
    https://docs.nvidia.com/vss/content/faq.html

  3. You can search in this visual-ai-agent forum to see if there is a similar problem to yours.

1. [EA version] access issue
If you’re using the EA version and you’re having permission issues. Please check as follows first.

  1. You can check what Subscriptions you have under and whether your subscription is expired: https://org.ngc.nvidia.com/subscriptions
    There should be NVIDIA Developer Program and VSS Early Access
  2. Ensure you generate a key through: NGC User Guide - NVIDIA Docs

2. [Deployment] Use fewer GPU resources to deploy
If you want to deploy with less GPU resources than in our Guide, please configure the VSS using the Helm overrides file.

The following overrides files are used when you want to deploy the VSS with 4 A100(80G).
override.yaml (1.4 KB)

You can also refer to our Guide to replace the llama-3.1-70b-instruct to llama-3_1-8b-instruct to use fewer GPU resources.

3. [Deployment]Use the network proxy method to deploy the VSS
If your network is using a proxy, please refer to #34 to learn how to deoply the VSS in this scenario. The following overrides files are used when you are using network proxy.
override-proxy.yaml (2.6 KB)

4.[souce code] Customize the source code
If you want to customize our code, please go into the container and modify it yourself.
You can refe to this 326660 topic. This topic shows how to customize the UI. You can just customize the via_demo_client.py source code and related files in vss-engine container image.

5.[preliminary debug] If the deployment of some pods fails, how to perform a preliminary analysis

  • You can run the following command to check the logs of the pods.
sudo microk8s kubectl logs vss-vss-deployment-POD-NAME
sudo microk8s kubectl describe pod vss-vss-deployment-POD-NAME
  • You can check the problem by directly enter the pod with the following command
sudo microk8s.kubectl exec -it vss-vss-deployment-POD-NAME -- /bin/bash

You can analyze the problem according to the log info, or post the log to your topic.

6.[cuda error] “gpu-operator-resources nvidia-cuda-validator-<NAME>” pod failed
After you have enabled nvidia and hostpath-storage add-ons, the “nvidia-cuda-validator-<NAME>” pod startup failed. You need to check if your fabricmanager version is the exactly same as the driver version. If not, please reinstall the fabricmanger by following the command below.

driver_version=<xxx.xx.xx>
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager

7.[Network Ports]The network ports used in the VSS deployment
During the VSS deployment, there may be issues with port conflicts. It is recommended to first use the netstat tools to check the current port usage.

$sudo apt install net-tools
$sudo netstat -tuln

The following are some of the default network ports of VSS.

  • Deploy Using Helm
    The network ports are all allocated internally by the k8s, and then mapped to the host. We only need to focus on the mapping of the two ports, backend port and frontend port. You can get that though the command below.
sudo microk8s kubectl get svc vss-service
  • Deploy Using Docker Compose
    The local network ports are used in this mode directly. The default port usage is as follows.
    LLM NIM: 8000
    Reranker NIM: 9235
    Embedding NIM: 9234
    FRONTEND_PORT: 9100
    BACKEND_PORT: 8100

If you modify the network default ports, please make the corresponding modification in the relevant files like local_deployment/config.yaml and local_deployment/guardrails/config.yml.

8. [Deploy Using Any VLM and LLM]How to Deploy VSS with Any VLM and LLM Models Providing an OpenAI-Compatible REST API

We use the following three models as examples:
Qwen2.5-VL-7B-Instruct
NVILA-8B-Video
DeepSeek-R1-Distill-Qwen-32B.

Assuming you have successfully deployed the LLM and VLM models independently, the steps are as follows:
Deploy the QwenVL independently

docker run --runtime nvidia --gpus all   --name \
my_vllm_container_QwenVL     -v \
~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<your_key>" \
-p 8000:8000    --ipc=host   vllm/vllm-openai:latest  \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--served-model-name QwenVL

Deploy the DeepSeek independently

docker run --runtime nvidia --gpus all --name \
my_vllm_container_deepseek  \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<your_key>"  \
-p 8000:8000    --ipc=host   vllm/vllm-openai:latest \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--served-model-name deepseek

Deploy the NVILA-8B-Video independently

Deploy VSS Using Helm
You can deploy the VSS with QwenVL and DeepSeek using the following YAML file:
override_QwenVL_deepseek.yaml (3.5 KB)
Run the command:

sudo microk8s helm install vss-blueprint \
nvidia-blueprint-vss-2.3.0.tgz     \
--set global.ngcImagePullSecretName=ngc-docker-reg-secret \
-f override_QwenVL_deepseek.yaml

To deploy VSS with LLama-3.1-70b and NVILA-8B-Video, use this YAML file:
override_NVILA_8B-VIDEO.yaml (2.9 KB)
Important: Modify the IP addresses in the YAML file to match the service IPs of your deployed LLM and VLM instances.