I have launchpad access with 8 NVIDIA H100 NVL GPUs. I am able to deploy and use VSS with nvila but unable to use gpt-4o model with VSS replacing nvila. I followed the instructions mentioned in Nvidia documentation( Configure the VLM — Video Search and Summarization Agent). Added the code from overrides file which I have modified for using gpt-4o through openai azure api key.
Below are the commands:
OPENAI_API_KEY=‘XXXXXXXXXXXXXXX’
NGC_API_KEY=‘nvapi-XXXXXXXXXXXXXXXXXXX’
kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username=‘$oauthtoken’ --docker-password=$NGC_API_KEY
kubectl create secret generic graph-db-creds-secret --from-literal=username=neo4j --from-literal=password=password
kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY
helm fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.2.0.tgz --username=‘$oauthtoken’ --password=$NGC_API_KEY
helm install vss-blueprint nvidia-blueprint-vss-2.2.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides.yaml
Getting error while deploying, attached image for reference:
Content of overrides file:
vss:
applicationSpecs:
vss-deployment:
containers:
vss:
image:
repository: nvcr.io/nvidia/blueprint/vss-engine
tag: 2.2.0 # Update to override with custom VSS image
env:
- name: VLM_MODEL_TO_USE
value: openai-compat
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-api-key-secret
key: OPENAI_API_KEY
- name: DISABLE_GUARDRAILS
value: "false" # "true" to disable guardrails.
- name: TRT_LLM_MODE
value: "" # int4_awq (default), int8 or fp16. (for VILA only)
- name: VLM_BATCH_SIZE
value: "" # Default is determined based on GPU memory. (for VILA only)
- name: VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME
value: "gpt-4o" # Set to use a VLM exposed as a REST API with OpenAI compatible API (e.g. gpt-4o)
- name: VIA_VLM_ENDPOINT
value: "https://usncoai0kua.openai.azure.com" # Default OpenAI API. Override to use a custom API
- name: VIA_VLM_API_KEY
value: "XXXXXXXXXXXXXXXXXXXXXXX" # API key to set when calling VIA_VLM_ENDPOINT
- name: OPENAI_API_VERSION
value: "2024-05-01-preview"
- name: AZURE_OPENAI_API_VERSION
value: "2024-05-01-preview"
- name: AZURE_OPENAI_ENDPOINT
value: "https://usncoai0kua.openai.azure.com"
resources:
limits:
nvidia.com/gpu: 2 # Set to 8 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-1>
nim-llm:
resources:
limits:
nvidia.com/gpu: 4
# nodeSelector:
# kubernetes.io/hostname: <node-2>
nemo-embedding:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-2>
nemo-rerank:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-2>
