All env variables are set properly, but I couldn’t find the override.yml file to uncomment the related line, so I kept it commented. I get this error every time I try to use the config to deploy the VSS, and after each time I remove the tmp folder.
Since I was following the AWS deploy instructions, I didn’t notice the overrides.yaml file.
I added this overrides file, but got the same error while trying the deploy the config. I kept the overrides file unchanged from what was on the link you provided.
- name: VLM_MODEL_TO_USE
value: vila-1.5 # Or "openai-compat" or "custom"
# Specify path in case of VILA-1.5 and custom model. Can be either
# a NGC resource path or a local path. For custom models this
# must be a path to the directory containing "inference.py" and
# "manifest.yaml" files.
- name: MODEL_PATH
value: "ngc:nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"
- name: DISABLE_GUARDRAILS
value: "false" # "true" to disable guardrails.
- name: TRT_LLM_MODE
value: "" # int4_awq (default), int8 or fp16. (for VILA only)
- name: VLM_BATCH_SIZE
value: "" # Default is determined based on GPU memory. (for VILA only)
- name: VIA_VLM_OPENAI_MODEL_DEPLOYMENT_NAME
value: "" # Set to use a VLM exposed as a REST API with OpenAI compatible API (e.g. gpt-4o)
- name: VIA_VLM_ENDPOINT
value: "" # Default OpenAI API. Override to use a custom API
- name: VIA_VLM_API_KEY
value: "" # API key to set when calling VIA_VLM_ENDPOINT
- name: OPENAI_API_VERSION
value: ""
- name: AZURE_OPENAI_API_VERSION
value: ""
resources:
limits:
nvidia.com/gpu: 2 # Set to 8 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-1>
nim-llm:
resources:
limits:
nvidia.com/gpu: 4
# nodeSelector:
# kubernetes.io/hostname: <node-2>
nemo-embedding:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-2>
nemo-rerank:
resources:
limits:
nvidia.com/gpu: 1 # Set to 2 for 2 x 8H100 node deployment
# nodeSelector:
# kubernetes.io/hostname: <node-2>
Thanks for your message. The problem in processing the config file was for one of the environment variables which is fixed now, but I faced another problem with this log:
preparing artifacts
applying TF shape
CTRL-C to abort
â•·
│ Error: error configuring S3 Backend: error validating provider credentials: error calling sts:GetCallerIdentity: InvalidClientTokenId: The security token included in the request is invalid.
│ status code: 403, request id: b9b523ef-4809-471a-9e82-c43aaab5c08f
│
│
╵
â•·
│ Error: Backend initialization required, please run "terraform init"
│
│ Reason: Initial configuration of the requested backend "s3"
│
│ The "backend" is the interface that Terraform uses to store state,
│ perform operations, etc. If this message is showing up, it means that the
│ Terraform configuration you're using is using a custom configuration for
│ the Terraform backend.
│
│ Changes to backend configurations require reinitialization. This allows
│ Terraform to set up the new configuration, copy existing state, etc. Please run
│ "terraform init" with either the "-reconfigure" or "-migrate-state" flags to
│ use the current configuration.
│
│ If the change reason above is incorrect, please verify your configuration
│ hasn't changed and try again. At this point, no changes to your existing
│ configuration or state have been made.
╵
failed to determine IaC changes
This looks like an AWS permission issue. Could your verify your access credentials for AWS? And you can also verify that if the IAM user or role associated with the provided credentials has the necessary permissions to perform sts:GetCallerIdentity and access the S3 bucket.