-
First please read our Guide carefully to deploy and debug the VSS.
https://docs.nvidia.com/vss/index.html -
You can first search the FAQ in our Guide to see if there are similar problem to yours.
https://docs.nvidia.com/vss/content/faq.html -
You can search in this visual-ai-agent forum to see if there is a similar problem to yours.
1. [EA version] access issue
If you’re using the EA version and you’re having permission issues. Please check as follows first.
- You can check what Subscriptions you have under and whether your subscription is expired: https://org.ngc.nvidia.com/subscriptions
There should be NVIDIA Developer Program and VSS Early Access - Ensure you generate a key through: NGC User Guide - NVIDIA Docs
2. [Deployment] Use fewer GPU resources to deploy
If you want to deploy with less GPU resources than in our Guide, please configure the VSS using the Helm overrides file.
The following overrides files are used when you want to deploy the VSS with 4 A100(80G).
override.yaml (1.4 KB)
You can also refer to our Guide to replace the llama-3.1-70b-instruct
to llama-3_1-8b-instruct
to use fewer GPU resources.
3. [Deployment]Use the network proxy method to deploy the VSS
If your network is using a proxy, please refer to #34 to learn how to deoply the VSS in this scenario. The following overrides files are used when you are using network proxy.
override-proxy.yaml (2.6 KB)
4.[souce code] Customize the source code
If you want to customize our code, please go into the container and modify it yourself.
You can refe to this 326660 topic. This topic shows how to customize the UI. You can just customize the via_demo_client.py source code and related files in vss-engine container image.
5.[preliminary debug] If the deployment of some pods fails, how to perform a preliminary analysis
- You can run the following command to check the logs of the pods.
sudo microk8s kubectl logs vss-vss-deployment-POD-NAME
sudo microk8s kubectl describe pod vss-vss-deployment-POD-NAME
- You can check the problem by directly enter the pod with the following command
sudo microk8s.kubectl exec -it vss-vss-deployment-POD-NAME -- /bin/bash
You can analyze the problem according to the log info, or post the log to your topic.
6.[cuda error] “gpu-operator-resources nvidia-cuda-validator-<NAME>” pod failed
After you have enabled nvidia and hostpath-storage add-ons, the “nvidia-cuda-validator-<NAME>” pod startup failed. You need to check if your fabricmanager version is the exactly same as the driver version. If not, please reinstall the fabricmanger by following the command below.
driver_version=<xxx.xx.xx>
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager
7.[Network Ports]The network ports used in the VSS deployment
During the VSS deployment, there may be issues with port conflicts. It is recommended to first use the netstat
tools to check the current port usage.
$sudo apt install net-tools
$sudo netstat -tuln
The following are some of the default network ports of VSS.
- Deploy Using Helm
The network ports are all allocated internally by the k8s, and then mapped to the host. We only need to focus on the mapping of the two ports, backend port and frontend port. You can get that though the command below.
sudo microk8s kubectl get svc vss-service
- Deploy Using Docker Compose
The local network ports are used in this mode directly. The default port usage is as follows.
LLM NIM: 8000
Reranker NIM: 9235
Embedding NIM: 9234
FRONTEND_PORT: 9100
BACKEND_PORT: 8100
If you modify the network default ports, please make the corresponding modification in the relevant files like local_deployment/config.yaml and local_deployment/guardrails/config.yml.
8. [Deploy Using Any VLM and LLM]How to Deploy VSS with Any VLM and LLM Models Providing an OpenAI-Compatible REST API
We use the following three models as examples:
Qwen2.5-VL-7B-Instruct
NVILA-8B-Video
DeepSeek-R1-Distill-Qwen-32B.
Assuming you have successfully deployed the LLM and VLM models independently, the steps are as follows:
Deploy the QwenVL independently
docker run --runtime nvidia --gpus all --name \
my_vllm_container_QwenVL -v \
~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<your_key>" \
-p 8000:8000 --ipc=host vllm/vllm-openai:latest \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--served-model-name QwenVL
Deploy the DeepSeek independently
docker run --runtime nvidia --gpus all --name \
my_vllm_container_deepseek \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<your_key>" \
-p 8000:8000 --ipc=host vllm/vllm-openai:latest \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--served-model-name deepseek
Deploy the NVILA-8B-Video independently
- Set up the deployment environment according to the VILA source code.
- Add the
NVILA-8B-Video
to the source code server.py model: Literal - Focus on the following steps running-vila-api-server. Please modify the
--conv-mode vicuna_v1
in the CLI command
Deploy VSS Using Helm
You can deploy the VSS with QwenVL and DeepSeek using the following YAML file:
override_QwenVL_deepseek.yaml (3.5 KB)
Run the command:
sudo microk8s helm install vss-blueprint \
nvidia-blueprint-vss-2.3.0.tgz \
--set global.ngcImagePullSecretName=ngc-docker-reg-secret \
-f override_QwenVL_deepseek.yaml
To deploy VSS with LLama-3.1-70b and NVILA-8B-Video, use this YAML file:
override_NVILA_8B-VIDEO.yaml (2.9 KB)
Important: Modify the IP addresses in the YAML file to match the service IPs of your deployed LLM and VLM instances.