Deployment of fine-tuned Wisper models in RIVA

Please provide the following information when requesting support.

Hardware - GPU - L4
Hardware - CPU - 8 vCPUs
Operating System - Linux/Ubuntu
Riva Version - 2.19

Hello. I need to deploy a whisper-large-v3-turbo model fine-tuned on a specific domain. As I understand it, RIVA currently only supports the standard OpenAI Wisper models: whisper-large-v3 and whisper-large-v3-turbo. I tried to build the decoder and encoder .engine files myself using the instructions from the official TensorRT-LLM repository. However, when deploying the model in RIVA, I get errors like:

[07/22/2025-10:25:24] [TRT-LLM] [E] Found tensor names: ['input_ids', 'position_ids',
'encoder_input_lengths', 'encoder_max_input_length', 'encoder_output', 'host_past_key_value_lengths',
'host_context_lengths', 'sequence_length', 'context_lengths', 'host_request_types',
'host_runtime_perf_knobs', 'host_context_progress', 'last_token_ids', 'cross_attention_mask',
'cross_attention_packed_mask', 'cache_indirection', 'host_max_attention_window_sizes',
'host_sink_token_length', 'kv_cache_block_offsets', 'host_kv_cache_block_offsets',
'host_kv_cache_pool_pointers', 'host_kv_cache_pool_mapping', 'cross_kv_cache_block_offsets',
'host_cross_kv_cache_block_offsets', 'host_cross_kv_cache_pool_pointers',
'host_cross_kv_cache_pool_mapping', 'cross_kv_cache_gen', 'logits']

It looks like RIVA uses custom configurations when building the decoder. Are there currently any instructions on how to deploy a fine-tuned OpenAI or HuggingFace Whisper models in the RIVA pipeline? Or is RIVA’s team planning to add such support with the new version of RIVA?

There is a script to help you perform this task. Can you open a support ticket?

@amargolin I’m new here. Can you tell me what you mean by opening a support ticket and how to do it?

whisper-to-riva-main-75841680.tar.gz (9.0 KB)