I deployed VSS 2.3.1 according to the guide, and when I tried to use it with a modified Riva model, I encountered the following error.
We currently use Nvidia B200 *8 Node X 8 machine
GPU Memory: 180 GB HBM3e per GPU — 1.4 TB total GPU memory, System Memory (DIMM): 2 TB (32 DIMMs),
vss 2.3.1 chart’s riva values
riva:
enabled: true
namespace: vss
nodeSelector:
kubernetes.io/hostname: dgx-b200-03
storageClassName: gpfs
applicationSpecs:
riva-deployment:
containers:
riva-container:
env:
- name: NIM_TAGS_SELECTOR
value: mode=all
image:
repository: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual
tag: 1
workloadSpecs:
dummy: {}
wl-env:
wl_env:
- name: NGC_API_KEY
valueFrom:
secretKeyRef:
key: NGC_API_KEY
name: infisical-managed-secrets
- name: NIM_HTTP_API_PORT
value: '9000'
- name: NIM_GRPC_API_PORT
value: '50051'
- name: NIM_CACHE_PATH
value: /mnt/nim-cache
- name: DISABLE_RIVA_HTTP_SERVER
value: "False"
- name: NIM_DISABLE_GRPC_STARTUP
value: "False"
- name: NIM_DISABLE_TRITON_STARTUP
value: "False"
wl_units: 1
When using the original nvcr.io/nim/nvidia/parakeet-0-6b-ctc-en-us model, no error occurred. After changing the model, During RIVA model deployment, the following error occurs and the process fails:
[I] Loading model: /tmp/tmpjyzzlv2i/model.onnx
[''][I] Folding Constants | Pass 1
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I] Total Nodes | Original: 8085, After Folding: 5218 | 2867 Nodes Folded
[''][I] Folding Constants | Pass 2
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I] Total Nodes | Original: 5218, After Folding: 4875 | 343 Nodes Folded
[''][I] Folding Constants | Pass 3
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I] Total Nodes | Original: 4875, After Folding: 4875 | 0 Nodes Folded
[I] Saving ONNX model to: /tmp/tmpjyzzlv2i/model.onnx
[''][W] Model size (2045.0 MiB) exceeds protobuf size threshold (1907.0 MiB). Will save weight data to an external file.
To control the location of this file, use the `external_data_path` parameter or the `--external-data-path` command-line option.
[''][W] ModelImporter.cpp:804: Make sure output 11696 has Int64 binding.
[I] Configuring with profiles:[
Profile 0:
{audio_signal [min=[1, 80, 801], opt=[128, 80, 801], max=[128, 80, 801]],
length [min=[1], opt=[128], max=[128]]}
]
[''][W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[''][I] Building engine with configuration:
Flags | [FP16, TF32, OBEY_PRECISION_CONSTRAINTS]
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 182642.38 MiB, TACTIC_DRAM: 182642.38 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
[''][I] Finished engine building in 219.074 seconds
It seems the issue is with
Unsupported model IR version: 11, max supported IR version: 10.
When trying to deploy the model on riva-server, running curl http://localhost:9000/v1/health/live failed to connect to the server.
Questions:
-
It looks like an IR version compatibility issue — what steps should I take to resolve this?
-
For using this model in Riva, should I convert the ONNX model or update ONNXRuntime/TensorRT?
-
Additionally, there’s a warning about the model size exceeding the protobuf threshold. Could this also be contributing to the failure?