Error occurs when changing the model in VSS 2.3.1 Riva

I deployed VSS 2.3.1 according to the guide, and when I tried to use it with a modified Riva model, I encountered the following error.

We currently use Nvidia B200 *8 Node X 8 machine

GPU Memory: 180 GB HBM3e per GPU — 1.4 TB total GPU memory, System Memory (DIMM): 2 TB (32 DIMMs),

vss 2.3.1 chart’s riva values

riva:
  enabled: true
  namespace: vss  
  nodeSelector: 
    kubernetes.io/hostname: dgx-b200-03
  storageClassName: gpfs
  applicationSpecs:
    riva-deployment:
    containers:
      riva-container:
        env:
        - name: NIM_TAGS_SELECTOR
          value: mode=all
        image:
          repository: nvcr.io/nim/nvidia/parakeet-1-1b-rnnt-multilingual
          tag: 1
  workloadSpecs:
    dummy: {}
    wl-env:
      wl_env:
      - name: NGC_API_KEY
        valueFrom:
          secretKeyRef:
            key: NGC_API_KEY
            name: infisical-managed-secrets
      - name: NIM_HTTP_API_PORT
        value: '9000'
      - name: NIM_GRPC_API_PORT
        value: '50051'
      - name: NIM_CACHE_PATH
        value: /mnt/nim-cache

      - name: DISABLE_RIVA_HTTP_SERVER
        value: "False"
      - name: NIM_DISABLE_GRPC_STARTUP
        value: "False"
      - name: NIM_DISABLE_TRITON_STARTUP
        value: "False"
      wl_units: 1

When using the original nvcr.io/nim/nvidia/parakeet-0-6b-ctc-en-us model, no error occurred. After changing the model, During RIVA model deployment, the following error occurs and the process fails:

[I] Loading model: /tmp/tmpjyzzlv2i/model.onnx
[''][I] Folding Constants | Pass 1
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I]     Total Nodes | Original:  8085, After Folding:  5218 |  2867 Nodes Folded
[''][I] Folding Constants | Pass 2
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I]     Total Nodes | Original:  5218, After Folding:  4875 |   343 Nodes Folded
[''][I] Folding Constants | Pass 3
[''][W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:180 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 11, max supported IR version: 10
[''][I]     Total Nodes | Original:  4875, After Folding:  4875 |     0 Nodes Folded
[I] Saving ONNX model to: /tmp/tmpjyzzlv2i/model.onnx
[''][W] Model size (2045.0 MiB) exceeds protobuf size threshold (1907.0 MiB). Will save weight data to an external file.
    To control the location of this file, use the `external_data_path` parameter or the `--external-data-path` command-line option.
[''][W] ModelImporter.cpp:804: Make sure output 11696 has Int64 binding.
[I] Configuring with profiles:[
        Profile 0:
            {audio_signal [min=[1, 80, 801], opt=[128, 80, 801], max=[128, 80, 801]],
             length [min=[1], opt=[128], max=[128]]}
    ]
[''][W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[''][I] Building engine with configuration:
    Flags                  | [FP16, TF32, OBEY_PRECISION_CONSTRAINTS]
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 182642.38 MiB, TACTIC_DRAM: 182642.38 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806]
[''][I] Finished engine building in 219.074 seconds

It seems the issue is with
Unsupported model IR version: 11, max supported IR version: 10.

When trying to deploy the model on riva-server, running curl http://localhost:9000/v1/health/live failed to connect to the server.

Questions:

  • It looks like an IR version compatibility issue — what steps should I take to resolve this?

  • For using this model in Riva, should I convert the ONNX model or update ONNXRuntime/TensorRT?

  • Additionally, there’s a warning about the model size exceeding the protobuf threshold. Could this also be contributing to the failure?

Can you please share the complete logs.
The above errors can be ignored. The engine is built successfully. Something went bad after this which is why you were unable to deploy.

After increasing the liveness, readiness, and startup probe values, the application executed successfully during testing. Thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.