-
Hardware Platform (GPU model and numbers)
NVIDIA H100 80GB HBM3 * 8 -
System Memory
total 2.0Ti
available 1.7Ti -
Ubuntu Version
Ubuntu 22.04.4 LTS -
NVIDIA GPU Driver Version (valid for GPU only)
NVIDIA-SMI 550.90.07 -
Issue Type( questions )
Hello NVIDIA team,
I am currently trying to use OpenAI’s GPT-4o
model instead of the default llm-svc
in the VSS Blueprint Helm Chart. To achieve this, I configured the overrides.yaml
file as follows:
Configuration (overrides.yaml
):
nim-llm:
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "0,1,2,3"
resources:
limits:
nvidia.com/gpu: 0 # no limit
vss:
applicationSpecs:
vss-deployment:
containers:
vss:
startupProbe:
failureThreshold: 360
env:
- name: VLM_MODEL_TO_USE
value: openai-compat
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-api-key-secret
key: OPENAI_API_KEY
- name: OPENAI_API_KEY_NAME
value: OPENAI_API_KEY
- name: MODEL_PATH
value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
- name: NVIDIA_VISIBLE_DEVICES
value: "4,5,6,7"
- name: ASSET_STORAGE_DIR # custom upload directory
value: "/tmp/custom-asset-dir"
- name: EXAMPLE_STREAMS_DIR # custom example directory
value: "/tmp/custom-example-streams-dir"
resources:
limits:
nvidia.com/gpu: 0
extraPodVolumes:
- name: custom-asset-dir
hostPath:
path: /home/nvadmin/Workspace/blueprint/video_uploads # custom upload directory on host
- name: custom-example-streams-dir
hostPath:
path: /home/nvadmin/Workspace/blueprint/video_examples # custom example directory on host
extraPodVolumeMounts:
- name: custom-asset-dir
mountPath: /tmp/custom-asset-dir
- name: custom-example-streams-dir
mountPath: /tmp/custom-example-streams-dir
configs:
ca_rag_config.yaml:
chat:
embedding:
base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
llm:
base_url: https://api.openai.com/v1
model: gpt-4o
reranker:
base_url: http://nemo-rerank-ranking-deployment-ranking-service:8000/v1
summarization:
embedding:
base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
llm:
base_url: https://api.openai.com/v1
model: gpt-4o
guardrails_config.yaml:
models:
- engine: nim
model: gpt-4o
parameters:
base_url: https://api.openai.com/v1
type: main
- engine: nim_patch
model: nvidia/llama-3.2-nv-embedqa-1b-v2
parameters:
base_url: http://nemo-embedding-embedding-deployment-embedding-service:8000/v1
type: embeddings
nemo-embedding:
applicationSpecs:
embedding-deployment:
containers:
embedding-container:
env:
- name: NGC_API_KEY
valueFrom:
secretKeyRef:
key: NGC_API_KEY
name: ngc-api-key-secret
- name: NVIDIA_VISIBLE_DEVICES
value: '4'
resources:
limits:
nvidia.com/gpu: 0
nemo-rerank:
applicationSpecs:
ranking-deployment:
containers:
ranking-container:
env:
- name: NGC_API_KEY
valueFrom:
secretKeyRef:
key: NGC_API_KEY
name: ngc-api-key-secret
- name: NVIDIA_VISIBLE_DEVICES
value: '4'
resources:
limits:
nvidia.com/gpu: 0
After deploying with this configuration, I encountered the following error (vss-deployment):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/via/via-engine/via_server.py", line 1154, in run
self._stream_handler = ViaStreamHandler(self._args)
File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 422, in __init__
response = asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/rails/llm/llmrails.py", line 688, in generate_async
new_events = await self.runtime.generate_events(
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 167, in generate_events
next_events = await self._process_start_action(events)
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 363, in _process_start_action
result, status = await self.action_dispatcher.execute_action(
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 253, in execute_action
raise e
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 214, in execute_action
result = await result
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/library/self_check/input_check/actions.py", line 71, in self_check_input
response = await llm_call(llm, prompt, stop=stop)
File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/utils.py", line 96, in llm_call
raise LLMCallException(e)
nemoguardrails.actions.llm.utils.LLMCallException: LLM Call Exception: [###] {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}
{'error': {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/via/via-engine/via_server.py", line 2481, in <module>
server.run()
File "/tmp/via/via-engine/via_server.py", line 1156, in run
raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")
via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - LLM Call Exception: [###] {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}
{'error': {'message': 'Incorrect API key provided: **********. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
However, when I manually executed the following script inside the vss-vss-deployment
pod, it seemed to work correctly, which suggests that the API key itself is not the issue:
import os
import openai
models = openai.models.list()
print(models)
SyncPage[Model](data=[Model(id='omni-moderation-2024-09-26', created=1732734466, object='model', owned_by='system'), Model(id='gpt-4o-mini-audio-preview-2024-12-17', created=1734115920, object='model', owned_by='system'), Model(id='dall-e-3', created=1698785189, object='model', owned_by='system'), Model(id='dall-e-2', created=1698798177, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview-2024-10-01', created=1727389042, object='model', owned_by='system'), Model(id='o1', created=1734375816, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview', created=1727460443, object='model', owned_by='system'), Model(id='gpt-4o-mini-realtime-preview-2024-12-17', created=1734112601, object='model', owned_by='system'), Model(id='o1-2024-12-17', created
....
The script successfully retrieved the available models from OpenAI, indicating that authentication is working in this environment.
Question:
How can I properly configure VSS Blueprint to use OpenAI’s GPT-4o instead of llm-svc
without encountering the API key error? Are there any additional environment variables or configurations I should modify?
Any guidance would be greatly appreciated.
Thank you!