Hardware - GPU (A100 vGPU - 10GB VRAM and 1/7 GPU allocated). Vultr hosting
Hardware - CPU
Operating System: Ubuntu 22.04
Riva Version: 2.11
Nvidia Driver Version: 525.85.05
riva-build speech_recognition \
conformer.rmir:tlt_encode Conformer-CTC-PE_large_Riva_ASR_set_3.0_ep107_trt_exportable.riva:tlt_encode \
--name=conformer-en-US-asr-streaming \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--endpointing.start_history=200 \
--nn.fp16_needs_obey_precision_pass \
--endpointing.residue_blanks_at_start=-2 \
--chunk_size=0.8 \
--left_padding_size=1.6 \
--right_padding_size=1.6 \
--decoder_type=flashlight \
--flashlight_decoder.asr_model_delay=-1 \
--decoding_language_model_binary=lm.binary \
--decoding_vocab=vocab.txt \
--flashlight_decoder.lm_weight=0.8 \
--flashlight_decoder.word_insertion_score=1.0 \
--flashlight_decoder.beam_size=32 \
--flashlight_decoder.beam_threshold=20. \
--flashlight_decoder.num_tokenization=1 \
--language_code=en-US \
--wfst_tokenizer_model=tokenize_and_classify.far \
--wfst_verbalizer_model=verbalize.far \
--force
When deploying with the above build it fails with the following errors:
[06/10/2023-10:33:04] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::35] Error Code 1: Myelin (No Myelin Error exists)
[06/10/2023-10:33:04] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: CUDA error 800 failed to create CUDA stream
[06/10/2023-10:33:18] [TRT] [E] 4: [optimizer.cpp::computeCosts::3710] Error Code 4: Internal Error (Could not find any implementation for node {ForeignNode[746 + (Unnamed Layer* 20) [Shuffle]…MatMul_269]} due to insufficient workspace. See verbose log for requested sizes.)
[06/10/2023-10:33:18] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::738] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
I’ve tried different models and all fail with the same error. I tried setting nn.trt_max_workspace_size to 6
GB because of the error about insufficient workspace but it made no difference. Running exactly the same build and deploy scripts locally on my home server with RTX3060 works fine.