Deployment of fine-tuned Wisper models in RIVA

serhii-artemuk · July 31, 2025, 8:30am

Please provide the following information when requesting support.

Hardware - GPU - L4
Hardware - CPU - 8 vCPUs
Operating System - Linux/Ubuntu
Riva Version - 2.19

Hello. I need to deploy a whisper-large-v3-turbo model fine-tuned on a specific domain. As I understand it, RIVA currently only supports the standard OpenAI Wisper models: whisper-large-v3 and whisper-large-v3-turbo. I tried to build the decoder and encoder .engine files myself using the instructions from the official TensorRT-LLM repository. However, when deploying the model in RIVA, I get errors like:

[07/22/2025-10:25:24] [TRT-LLM] [E] Found tensor names: ['input_ids', 'position_ids',
'encoder_input_lengths', 'encoder_max_input_length', 'encoder_output', 'host_past_key_value_lengths',
'host_context_lengths', 'sequence_length', 'context_lengths', 'host_request_types',
'host_runtime_perf_knobs', 'host_context_progress', 'last_token_ids', 'cross_attention_mask',
'cross_attention_packed_mask', 'cache_indirection', 'host_max_attention_window_sizes',
'host_sink_token_length', 'kv_cache_block_offsets', 'host_kv_cache_block_offsets',
'host_kv_cache_pool_pointers', 'host_kv_cache_pool_mapping', 'cross_kv_cache_block_offsets',
'host_cross_kv_cache_block_offsets', 'host_cross_kv_cache_pool_pointers',
'host_cross_kv_cache_pool_mapping', 'cross_kv_cache_gen', 'logits']

It looks like RIVA uses custom configurations when building the decoder. Are there currently any instructions on how to deploy a fine-tuned OpenAI or HuggingFace Whisper models in the RIVA pipeline? Or is RIVA’s team planning to add such support with the new version of RIVA?

amargolin · August 5, 2025, 7:28pm

There is a script to help you perform this task. Can you open a support ticket?

serhii-artemuk · August 6, 2025, 11:06am

@amargolin I’m new here. Can you tell me what you mean by opening a support ticket and how to do it?

amargolin · September 4, 2025, 11:57pm

whisper-to-riva-main-75841680.tar.gz (9.0 KB)

Topic		Replies	Views
Deploying Whisper model Riva	4	514	January 24, 2025
Whisper deployment error on Nvidia T4 Riva	2	405	July 31, 2025
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	2	868	August 12, 2022
Riva 1.8.0 deploy pretrained Tacotron2+Waveglow TTS Riva tensorrt , nemo , riva	1	885	January 26, 2022
Encounter "Unsupported model IR version: 9, max supported IR version: 8" during deploy custom model in riva for TTS Riva onnx , riva	9	3795	January 22, 2024
Riva model deployment issue Riva inception	8	1782	April 4, 2024
Can it possible to convert wav2vec, whisper, ... models to .riva? Riva	0	232	May 21, 2024
Error in riva deployment Riva deployment aborted Riva ubuntu , nemo , riva	2	1219	February 27, 2023
Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned Riva	0	119	December 1, 2024
RIVA v2.15.0 fails to build NeMo model Riva	0	467	March 30, 2024

Deployment of fine-tuned Wisper models in RIVA

Related topics