Deploying Riva models on Vultr A100 vGPU instance fails with CUDA errors

steve.pritchard · June 10, 2023, 11:00am

Hardware - GPU (A100 vGPU - 10GB VRAM and 1/7 GPU allocated). Vultr hosting
Hardware - CPU
Operating System: Ubuntu 22.04
Riva Version: 2.11
Nvidia Driver Version: 525.85.05

riva-build speech_recognition \
    conformer.rmir:tlt_encode Conformer-CTC-PE_large_Riva_ASR_set_3.0_ep107_trt_exportable.riva:tlt_encode \
    --name=conformer-en-US-asr-streaming \
    --featurizer.use_utterance_norm_params=False \
    --featurizer.precalc_norm_time_steps=0 \
    --featurizer.precalc_norm_params=False \
    --ms_per_timestep=40 \
    --endpointing.start_history=200 \
    --nn.fp16_needs_obey_precision_pass \
    --endpointing.residue_blanks_at_start=-2 \
    --chunk_size=0.8 \
    --left_padding_size=1.6 \
    --right_padding_size=1.6 \
    --decoder_type=flashlight \
    --flashlight_decoder.asr_model_delay=-1 \
    --decoding_language_model_binary=lm.binary \
    --decoding_vocab=vocab.txt \
    --flashlight_decoder.lm_weight=0.8 \
    --flashlight_decoder.word_insertion_score=1.0 \
    --flashlight_decoder.beam_size=32 \
    --flashlight_decoder.beam_threshold=20. \
    --flashlight_decoder.num_tokenization=1 \
    --language_code=en-US \
    --wfst_tokenizer_model=tokenize_and_classify.far \
    --wfst_verbalizer_model=verbalize.far \
    --force

When deploying with the above build it fails with the following errors:

[06/10/2023-10:33:04] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::35] Error Code 1: Myelin (No Myelin Error exists)
[06/10/2023-10:33:04] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: CUDA error 800 failed to create CUDA stream

[06/10/2023-10:33:18] [TRT] [E] 4: [optimizer.cpp::computeCosts::3710] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[746 + (Unnamed Layer* 20) [Shuffle]…MatMul_269]} due to insufficient workspace. See verbose log for requested sizes.)
[06/10/2023-10:33:18] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

I’ve tried different models and all fail with the same error. I tried setting nn.trt_max_workspace_size to 6
GB because of the error about insufficient workspace but it made no difference. Running exactly the same build and deploy scripts locally on my home server with RTX3060 works fine.

rvinobha · June 25, 2023, 4:08am

Hi @steve.pritchard

Thanks for your interest in Riva

Apologies on the error
525 Drivers come with Cuda 12 installed
Riva required CUDA 11.8.89 , can you downgrade the CUDA version and try

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/support-matrix.html#id2

Thanks

steve.pritchard · June 25, 2023, 4:46am

Thanks for the reply. Unfortunately the driver version is fixed on the Vultr vGPU platform (I checked with their support). However, I thought that the Nvidia host drivers were backwards compatible with docker container drivers, ie the docker CUDA 11.8.89 should work with host CUDA 12?

Topic		Replies	Views
Riva-speech container failed to start Riva	3	610	April 6, 2023
Riva 1.8 riva_start.sh fail when build with language model Riva riva	3	1168	July 27, 2022
Riva model deployment issue Riva inception	8	1545	April 4, 2024
Nvidia Riva error Riva cuda , riva	8	626	May 11, 2024
RIVA ASR not working in ESXi8 environment Riva cuda , riva , esxi	5	706	May 11, 2023
Has anyone run Riva Speech Skills in Ubuntu on WSL2 Riva wsl , riva	7	1675	March 12, 2022
"Unable to create TensorRT engine" when loading models in riva-speech:1.7.0-beta-server Riva tensorrt , riva , inception	4	3092	February 3, 2022
Error when running Conformer-CTC model in Riva 1.8.0b0 Riva	3	995	January 9, 2022
Riva deployement error with custom ngram langugae model and custom acoustic model Riva nemo	0	485	June 9, 2023
Getting Error on command bash riva_init.sh Riva	10	1049	March 28, 2023

Deploying Riva models on Vultr A100 vGPU instance fails with CUDA errors

Related topics