VSS issue - API Key Issue When Using OpenAI GPT-4o Instead of LLM-SVC in VSS Blueprint

  • Hardware Platform (GPU model and numbers)
    NVIDIA H100 80GB HBM3 * 8

  • System Memory
    total 2.0Ti
    available 1.7Ti

  • Ubuntu Version
    Ubuntu 22.04.4 LTS

  • NVIDIA GPU Driver Version (valid for GPU only)
    NVIDIA-SMI 550.90.07

  • Issue Type( questions )

Hello NVIDIA team,

I am currently trying to use OpenAI’s GPT-4o model instead of the default llm-svc in the VSS Blueprint Helm Chart. To achieve this, I configured the overrides.yaml file as follows:

Configuration (overrides.yaml):

nim-llm:
  env:
  - name: NVIDIA_VISIBLE_DEVICES
    value: "0,1,2,3"
  resources:
    limits:
      nvidia.com/gpu: 0    # no limit
  
vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          startupProbe:
            failureThreshold: 360
          env:
          - name: VLM_MODEL_TO_USE
            value: openai-compat
          - name: OPENAI_API_KEY
            valueFrom:
              secretKeyRef:
                name: openai-api-key-secret
                key: OPENAI_API_KEY
          - name: OPENAI_API_KEY_NAME
            value: OPENAI_API_KEY 
          - name: MODEL_PATH
            value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
          - name: NVIDIA_VISIBLE_DEVICES
            value: "4,5,6,7"
          - name: ASSET_STORAGE_DIR # custom upload directory
            value: "/tmp/custom-asset-dir"
          - name: EXAMPLE_STREAMS_DIR  # custom example directory
            value: "/tmp/custom-example-streams-dir"

  resources:
    limits:
      nvidia.com/gpu: 0
  extraPodVolumes:
  - name: custom-asset-dir
    hostPath:
      path: /home/nvadmin/Workspace/blueprint/video_uploads # custom upload directory on host
  - name: custom-example-streams-dir
    hostPath:
      path: /home/nvadmin/Workspace/blueprint/video_examples  # custom example directory on host
  extraPodVolumeMounts:
  - name: custom-asset-dir
    mountPath: /tmp/custom-asset-dir
  - name: custom-example-streams-dir
    mountPath: /tmp/custom-example-streams-dir
  configs:
    ca_rag_config.yaml:
      chat:
        embedding:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        llm:
          base_url: https://api.openai.com/v1
          model: gpt-4o
        reranker:
          base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1
      summarization:
        embedding:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        llm:
          base_url: https://api.openai.com/v1
          model: gpt-4o
    guardrails_config.yaml:
      models:
      - engine: nim
        model: gpt-4o
        parameters:
          base_url: https://api.openai.com/v1
        type: main
      - engine: nim_patch
        model: nvidia/llama-3.2-nv-embedqa-1b-v2
        parameters:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        type: embeddings


nemo-embedding:
  applicationSpecs:
    embedding-deployment:
      containers:
        embedding-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '4'
  resources:
    limits:
      nvidia.com/gpu: 0

nemo-rerank:
  applicationSpecs:
    ranking-deployment:
      containers:
        ranking-container:
          env:
          - name: NGC_API_KEY
            valueFrom:
              secretKeyRef:
                key: NGC_API_KEY
                name: ngc-api-key-secret
          - name: NVIDIA_VISIBLE_DEVICES
            value: '4'
  resources:
    limits:
      nvidia.com/gpu: 0

After deploying with this configuration, I encountered the following error (vss-deployment):

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/via/via-engine/via_server.py", line 1154, in run
    self._stream_handler = ViaStreamHandler(self._args)
  File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 422, in __init__
    response = asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/rails/llm/llmrails.py", line 688, in generate_async
    new_events = await self.runtime.generate_events(
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 167, in generate_events
    next_events = await self._process_start_action(events)
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 363, in _process_start_action
    result, status = await self.action_dispatcher.execute_action(
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 253, in execute_action
    raise e
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 214, in execute_action
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/library/self_check/input_check/actions.py", line 71, in self_check_input
    response = await llm_call(llm, prompt, stop=stop)
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/utils.py", line 96, in llm_call
    raise LLMCallException(e)
nemoguardrails.actions.llm.utils.LLMCallException: LLM Call Exception: [###] {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}
{'error': {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/via/via-engine/via_server.py", line 2481, in <module>
    server.run()
  File "/tmp/via/via-engine/via_server.py", line 1156, in run
    raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")
via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - LLM Call Exception: [###] {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}
{'error': {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

However, when I manually executed the following script inside the vss-vss-deployment pod, it seemed to work correctly, which suggests that the API key itself is not the issue:

import os
import openai

models = openai.models.list()
print(models)

SyncPage[Model](data=[Model(id='omni-moderation-2024-09-26', created=1732734466, object='model', owned_by='system'), Model(id='gpt-4o-mini-audio-preview-2024-12-17', created=1734115920, object='model', owned_by='system'), Model(id='dall-e-3', created=1698785189, object='model', owned_by='system'), Model(id='dall-e-2', created=1698798177, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview-2024-10-01', created=1727389042, object='model', owned_by='system'), Model(id='o1', created=1734375816, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview', created=1727460443, object='model', owned_by='system'), Model(id='gpt-4o-mini-realtime-preview-2024-12-17', created=1734112601, object='model', owned_by='system'), Model(id='o1-2024-12-17', created
....

The script successfully retrieved the available models from OpenAI, indicating that authentication is working in this environment.

Question:
How can I properly configure VSS Blueprint to use OpenAI’s GPT-4o instead of llm-svc without encountering the API key error? Are there any additional environment variables or configurations I should modify?

Any guidance would be greatly appreciated.

Thank you!

Could you refer to our configuring-for-gpt-4o to modify your overrides.yaml file?

yes. I add this code

          - name: VLM_MODEL_TO_USE
            value: openai-compat
          - name: OPENAI_API_KEY
            valueFrom:
              secretKeyRef:
                name: openai-api-key-secret
                key: OPENAI_API_KEY

It works fine when I just add the values ​​above.

The ultimate goal is to use the openai api and not use llm-nim pod.
I want use openai instead of “http://nim-llm:8000
so I add config in my values.
then I am getting openai api authentication error.

  configs:
    ca_rag_config.yaml:
      chat:
        embedding:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        llm:
          base_url: https://api.openai.com/v1
          model: gpt-4o
        reranker:
          base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1
      summarization:
        embedding:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        llm:
          base_url: https://api.openai.com/v1
          model: gpt-4o
    guardrails_config.yaml:
      models:
      - engine: nim
        model: gpt-4o
        parameters:
          base_url: https://api.openai.com/v1
        type: main
      - engine: nim_patch
        model: nvidia/llama-3.2-nv-embedqa-1b-v2
        parameters:
          base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
        type: embeddings

So the VLM was successfully replaced with GPT-4o, but the LLM failed, right? We’ll look into the problem as soon as possible.

Yes, That’s correct. We want to replace the LLM with the OpenAI API. We’ll wait for your response. Thanks