Deploying Riva models on Vultr A100 vGPU instance fails with CUDA errors

Hardware - GPU (A100 vGPU - 10GB VRAM and 1/7 GPU allocated). Vultr hosting
Hardware - CPU
Operating System: Ubuntu 22.04
Riva Version: 2.11
Nvidia Driver Version: 525.85.05

riva-build speech_recognition \
    conformer.rmir:tlt_encode Conformer-CTC-PE_large_Riva_ASR_set_3.0_ep107_trt_exportable.riva:tlt_encode \
    --name=conformer-en-US-asr-streaming \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --ms_per_timestep=40 \
    --endpointing.start_history=200 \
    --nn.fp16_needs_obey_precision_pass \
    --endpointing.residue_blanks_at_start=-2 \
    --chunk_size=0.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --decoder_type=flashlight \
    --flashlight_decoder.asr_model_delay=-1 \
    --decoding_language_model_binary=lm.binary \
    --decoding_vocab=vocab.txt \
    --flashlight_decoder.lm_weight=0.8 \
    --flashlight_decoder.word_insertion_score=1.0 \
    --flashlight_decoder.beam_size=32 \
    --flashlight_decoder.beam_threshold=20. \
    --flashlight_decoder.num_tokenization=1 \
    --language_code=en-US \
    --wfst_tokenizer_model=tokenize_and_classify.far \
    --wfst_verbalizer_model=verbalize.far \
    --force

When deploying with the above build it fails with the following errors:

[06/10/2023-10:33:04] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::35] Error Code 1: Myelin (No Myelin Error exists)
[06/10/2023-10:33:04] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: CUDA error 800 failed to create CUDA stream

[06/10/2023-10:33:18] [TRT] [E] 4: [optimizer.cpp::computeCosts::3710] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[746 + (Unnamed Layer* 20) [Shuffle]…MatMul_269]} due to insufficient workspace. See verbose log for requested sizes.)
[06/10/2023-10:33:18] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

I’ve tried different models and all fail with the same error. I tried setting nn.trt_max_workspace_size to 6
GB because of the error about insufficient workspace but it made no difference. Running exactly the same build and deploy scripts locally on my home server with RTX3060 works fine.

Hi @steve.pritchard

Thanks for your interest in Riva

Apologies on the error
525 Drivers come with Cuda 12 installed
Riva required CUDA 11.8.89 , can you downgrade the CUDA version and try

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/support-matrix.html#id2

Thanks

Thanks for the reply. Unfortunately the driver version is fixed on the Vultr vGPU platform (I checked with their support). However, I thought that the Nvidia host drivers were backwards compatible with docker container drivers, ie the docker CUDA 11.8.89 should work with host CUDA 12?