"Unable to create TensorRT engine" when loading models in riva-speech:1.7.0-beta-server

We are running a few fine-tuned Citrinet models in riva-speech:1.7.0-beta-server in k8s, and we’ve been unable to load the models in Riva-speech server after modifying riva-build and riva-deploy pipeline - the speech server fails with this error: "Unable to create TensorRT engine”.

Some context on riva-build and riva-deploy pipeline:

Originally, we run the “riva-build” and “riva-deploy” commands of riva-speech:1.7.0-beta-servicemaker on a separate VM and then copied the exported models over to where the Riva-speech server running in k8s could access them.

Recently we re-created riva-build and riva-deploy parts of this as a kubeflow pipeline, and since then Riva-speech server is unable to load the models, failing with this error:

E1125 00:08:13.321963 22 model_repository_manager.cc:1215] failed to load ‘riva-trt-custom-model-am-streaming-offline’ version 1: Internal: unable to create TensorRT engine

The only warnings in riva-deploy logs in kubeflow that are not present in riva-deploy logs on the VM are these:

[TensorRT] WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[TensorRT] WARNING: Convolution + generic activation fusion is disable due to incompatible driver or nvrtc

GPU, OS and Riva version are the same everywhere:

Hardware - GPU: T4
Operating System - Ubuntu 20.04
Riva Version: 1.7.0

There’s some difference in driver versions but they don’t exactly explain why 2+1 works and 3+1 does not:

  1. Riva-speech server in k8s:
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.119.04 Driver Version: 450.119.04 CUDA Version: 11.4 |
    |-------------------------------±---------------------±---------------------+

  2. VM with riva-build & riva-deploy (output works in Riva-speech server):
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
    |-------------------------------±---------------------±---------------------+

  3. Kubeflow pipeline with riva-build and riva-deploy (output doesn’t work in Riva-speech server):
    ±----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.119.04 Driver Version: 450.119.04 CUDA Version: 11.4 |
    |-------------------------------±---------------------±---------------------+

It has been suggested that we try to downgrade cuda version in k8s cluster where kubeflow is running. Unfortunately, according to documentation, cuda 11.0 is the latest supported version (Running GPUs  |  Kubernetes Engine Documentation  |  Google Cloud).

What I did instead was to set up another VM with an earlier version of cuda drivers (460.91.03) and run riva-build and riva-deploy there, and here are the results:

  1. Riva-deploy pipeline still shows the incompatible driver warning, same as when we run it in kubeflow:

[TensorRT] WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[TensorRT] WARNING: Convolution + generic activation fusion is disable due to incompatible driver or nvrtc

  1. the model exported on this VM works in Riva-speech server, despite the differences in cuda versions.

I’ve also noticed that the model that doesn’t work in Riva-speech server is twice as big (543Mb) vs the one that works (274Mb).

To sum it up:

  1. riva-build/deploy with cuda 11.5 and 495 drivers on VM → works in riva speech server w/cuda 11.4 and 450 drivers

  2. riva-build/deploy with cuda 11.4 and 460 drivers on VM → works in riva speech server w/cuda 11.4 and 450 drivers

  3. riva-build/deploy in kubeflow, with cuda 11.4 and 450 drivers (running on k8s nodes that only supports cuda 11.0) → does NOT work in riva speech server w/cuda 11.4 and 450 drivers

  4. latest supported cuda version in kubernetes is 11.0 - however, Riva-speech server running in k8s has no problems loading models exported on VMs with newer cuda driver versions (460 and 495)

Is there anything you could recommend we do to make it work in k8s & kubeflow with cuda 11.0?

Hi @darya.trofimova,

I think below issue is due to older Nvidia driver version. GKE node images currently use Nvidia driver version 450.119.04 which is compatible with CUDA 11.0. ( The latest supported CUDA version is 11.0 on both COS (1.18.6-gke.3504+) and Ubuntu (1.19.8-gke.1200+)

Please refer to below application consideration link:
CUDA Compatibility :: GPU Deployment and Management Documentation.

May be you can try deploying the solution to AWS EKS cluster. You can refer to below Riva doc for more details:
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/rivaasreks.html#

AWS has option for latest Nvidia driver version as well.

While executing step 4 in below section
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/rivaasreks.html#defining-and-launching-the-eks-cluster
Please use following helm install command instead. It’s a know documentation issue, which we are trying to correct as soon as possible:

helm install --namespace riva riva . --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` --set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` --set riva.speechServices.asr=true --set riva.speechServices.tts=true --set riva.speechServices.nlp=true

I hope this helps you proceed with deployment of RIVA on Kubernetes cluster.

Thanks

Hi @darya.trofimova,

Reconsidering the error mentioned above, it seems error might be due to the variation in TRT version used to generate the model after fine tuning and the TRT version used in the Triton server on the backend.

Could you please check the TRT version used in the server backend, and update the TRT version used in the TAO conversion to regenerate the model engine to be compatible?

As per RIVA software compatibility - * TensorRT [8.0.1.6] is used in backend service.
https://docs.nvidia.com/deeplearning/riva/user-guide/docs/support-matrix.html#id15

May be TAO Toolkit 3.0-21.11 for x86 + GPU - CUDA 11.3 / cuDNN 8.1 / TensorRT 8.0 version package can be used here.

Based on the version deployed at your end, please map the TAO toolkit version accordingly and let us know in case issue persist.

Regards,
Sunil Kumar