Unable to configure gpt-4o with VSS instead of vila using Openai-azure API key

ina.khandelwal · April 21, 2025, 2:07am

I have launchpad access with 8 NVIDIA H100 NVL GPUs. I am able to deploy and use VSS with nvila but unable to use gpt-4o model with VSS replacing nvila. I followed the instructions mentioned in Nvidia documentation( Configure the VLM — Video Search and Summarization Agent). Attaching the images from overrides file which I have modified for using gpt-4o through openai azure api key.

Below are the commands:
OPENAI_API_KEY=‘XXXXXXXXXXXXXXX’

NGC_API_KEY=‘nvapi-XXXXXXXXXXXXXXXXXXX’

kubectl create secret docker-registry ngc-docker-reg-secret --docker-server=nvcr.io --docker-username=‘$oauthtoken’ --docker-password=$NGC_API_KEY

kubectl create secret generic graph-db-creds-secret --from-literal=username=neo4j --from-literal=password=password

kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY

helm fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.2.0.tgz --username=‘$oauthtoken’ --password=$NGC_API_KEY

helm install vss-blueprint nvidia-blueprint-vss-2.2.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides_gpt.yaml

Getting error while deploying, attached image for reference:

Content of overrides file:

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          image:
            repository: nvcr.io/nvidia/blueprint/vss-engine
            tag: 2.2.0 # Update to override with custom VSS image
          env:
            - name: VLM_MODEL_TO_USE
              value: openai-compat
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: openai-api-key-secret
                  key: OPENAI_API_KEY
            - name: DISABLE_GUARDRAILS
              value: "false" # "true" to disable guardrails.
            - name: TRT_LLM_MODE
              value: ""  # int4_awq (default), int8 or fp16. (for VILA only)
            - name: VLM_BATCH_SIZE
              value: ""  # Default is determined based on GPU memory. (for VILA only)
            - name: VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME
              value: "gpt-4o"  # Set to use a VLM exposed as a REST API with OpenAI compatible API (e.g. gpt-4o)
            - name: VIA_VLM_ENDPOINT
              value: "https://usncoai0kua.openai.azure.com"  # Default OpenAI API. Override to use a custom API
            - name: VIA_VLM_API_KEY
              value: "XXXXXXXXXXXXXXXXXXXXXXX"  # API key to set when calling VIA_VLM_ENDPOINT
            - name: OPENAI_API_VERSION
              value: "2024-05-01-preview"
            - name: AZURE_OPENAI_API_VERSION
              value: "2024-05-01-preview"
            - name: AZURE_OPENAI_ENDPOINT
              value: "https://usncoai0kua.openai.azure.com"

  resources:
    limits:
      nvidia.com/gpu: 2   # Set to 8 for 2 x 8H100 node deployment
  # nodeSelector:
  #   kubernetes.io/hostname: <node-1>

nim-llm:
  resources:
    limits:
      nvidia.com/gpu: 4
  # nodeSelector:
  #   kubernetes.io/hostname: <node-2>

nemo-embedding:
  resources:
    limits:
      nvidia.com/gpu: 1  # Set to 2 for 2 x 8H100 node deployment
  # nodeSelector:
  #   kubernetes.io/hostname: <node-2>

nemo-rerank:
  resources:
    limits:
      nvidia.com/gpu: 1  # Set to 2 for 2 x 8H100 node deployment
  # nodeSelector:
  #   kubernetes.io/hostname: <node-2>

vss_deploy_logs.txt (8.3 KB)
vss_blueprint_logs.txt (3.6 KB)
rerank_logs.txt (3.3 KB)
embed_logs.txt (3.3 KB)

yuweiw · April 21, 2025, 6:46am

Could you refer to our FAQ and attach the detailed log information?

ina.khandelwal · April 21, 2025, 7:25am

Attached detailed logs for all pods which are not running

yuweiw · April 21, 2025, 9:02am

  Warning  FailedMount  2m10s (x2177 over 3d1h)  kubelet  MountVolume.SetUp failed for volume "secret-ngc-api-key-volume" : secret "ngc-api-key-secret" not found

Have you created the ngc-api-key-secret secret?

ina.khandelwal · April 21, 2025, 9:06am

Yes, I have created and used the NGC API KEY for deploying VSS with Nvila. I am able to deploy that successfully but when changing the model to gpt-4o then it’s failed to deploy.

kubectl get secrets

yuweiw · April 21, 2025, 10:19am

Could you check if your helm chart and the secret are deployed in the same namespace?

ina.khandelwal · April 21, 2025, 11:39am

Yes, they are in same namespace.
secrets

Pods:

yuweiw · April 22, 2025, 1:19am

From the image you attached, there is no ngc-api-key-secret in your vsstest namespace.

ina.khandelwal · April 22, 2025, 10:04am

Thanks for notifying that. It solved the error while deploying the services. Now I am able to access UI but when I am trying to use summarization API getting below error:

Attached logs for your reference.
summary_logs.txt (761.7 KB)

Response body for summarization:

{
  "id": "418fe290-a491-4c24-b32e-afcf08e3ee5f",
  "prompt": "Write a concise and clear dense caption for the provided video",
  "model": "gpt-4o",
  "api_type": "internal",
  "response_format": {
    "type": "text"
  },
  "stream": false,
  "stream_options": {
    "include_usage": false
  },
  "max_tokens": 512,
  "temperature": 0.2,
  "top_p": 1,
  "top_k": 100,
  "seed": 10,
  "chunk_duration": 60,
  "chunk_overlap_duration": 10,
  "summary_duration": 60,
  "media_info": {
    "type": "offset",
    "start_offset": 0,
    "end_offset": 4000000000
  },
  "user": "user-123",
  "caption_summarization_prompt": "Prompt for caption summarization",
  "summary_aggregation_prompt": "Prompt for summary aggregation",
  "graph_rag_prompt_yaml": "",
  "tools": [],
  "summarize": true,
  "enable_chat": true,
  "num_frames_per_chunk": 10,
  "vlm_input_width": 10,
  "vlm_input_height": 10,
  "summarize_batch_size": 5,
  "rag_type": "graph-rag",
  "rag_top_k": 5,
  "rag_batch_size": 5
}

yuweiw · April 24, 2025, 1:21am

Hi @ina.khandelwal , could you try with a sample video packaged inside VSS container? We have tried and it works well.

Summarize the bridge video using gradio UI(To get the file added to the backend)
From swagger UI execute GET /files API to get the asset id of the bridge file
Then, use summarize API using the swagger UI. Modify the “id” to the asset id of the bridge file and remove “api_type” before running the summarize API.

ina.khandelwal · April 24, 2025, 10:57am

This is the response body I tried after removing api_type, Status is 200 but I am not getting content in response

{
  "id": "fb796833-74f0-4fe5-b288-4b2af5fb0e10",
  "prompt": "Write a concise and clear dense caption for the provided warehouse video",
  "model": "gpt-4o",
  "response_format": {
    "type": "text"
  },
  "stream": false,
  "stream_options": {
    "include_usage": false
  },
  "max_tokens": 512,
  "temperature": 0.2,
  "top_p": 1,
  "top_k": 100,
  "seed": 10,
  "chunk_duration": 60,
  "chunk_overlap_duration": 10,
  "summary_duration": 60,
  "media_info": {
    "type": "offset",
    "start_offset": 0,
    "end_offset": 4000000000
  },
  "user": "user-123",
  "caption_summarization_prompt": "Prompt for caption summarization",
  "summary_aggregation_prompt": "Prompt for summary aggregation",
  "graph_rag_prompt_yaml": "",
  "tools": [],
  "summarize": true,
  "enable_chat": false,
  "num_frames_per_chunk": 10,
  "vlm_input_width": 10,
  "vlm_input_height": 10,
  "summarize_batch_size": 5,
  "rag_type": "graph-rag",
  "rag_top_k": 5,
  "rag_batch_size": 5
}

Output:

I tried to give same parameters as yours but getting below error in that case :

yuweiw · April 25, 2025, 1:45am

Did you use the “bridge video” from our samples?

ina.khandelwal · April 25, 2025, 6:20am

UI is working fine but I want to use APIs and getting that error while using API for summarization. It should work for other videos as well. Although even for default videos API is not working.

yuweiw · April 25, 2025, 7:58am

After you switch the VLM to gpt-4o, is it also okay to use the UI?
We suggest that you first load the “bridge video” using gradio UI to facilitate obtaining the file ID. Then you can use the API to get summarization.

Do you mean that even using our “bridge video” is not working?

ina.khandelwal · April 25, 2025, 12:52pm

UI is working fine for all videos but when trying to use API for any of the video default or external for summarization then it’s not showing me the content.

Topic		Replies	Views
Unable to configure gpt-4o with VSS instead of vila using Openai-azure API key Visual AI Agent nim	2	17	April 21, 2025
VSS issue - API Key Issue When Using OpenAI GPT-4o Instead of LLM-SVC in VSS Blueprint Visual AI Agent nvbugs , kubernetes , ngc , nim , llama-31-70b-instruct , nvidia-technologies , llama , blueprints	7	54	March 4, 2025
Warning Unhealthy kubelet Startup probe failed: Get "v1/health/ready": dial tcp 10.1.124.81:8000: connect: connection refused Visual AI Agent nvbugs , nim , llama	31	127	April 14, 2025
VSS Installation Visual AI Agent	14	130	February 14, 2025
VSS Installation problem Visual AI Agent	11	90	February 21, 2025
Getting Error while running blueprint-vss demo Visual AI Agent	30	382	January 24, 2025
VILA with VIA [New] Visual AI Agent demos-and-tutorials , llama	4	959	December 24, 2024
How to append DeepStream Metadata in Python without using Streammux / nvinfer for parallel branch? DeepStream SDK	21	698	March 12, 2024
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1256	January 2, 2024
Importing image from Azure Custom Vision Compact Domain (S1) into Deepstream 6 (SSD model) DeepStream SDK	10	721	April 7, 2022

Unable to configure gpt-4o with VSS instead of vila using Openai-azure API key

Related topics