Please provide the following information when requesting support.
Hardware - GPU - L4
Hardware - CPU - 8 vCPUs
Operating System - Linux/Ubuntu
Riva Version - 2.19
Hello. I need to deploy a whisper-large-v3-turbo model fine-tuned on a specific domain. As I understand it, RIVA currently only supports the standard OpenAI Wisper models: whisper-large-v3 and whisper-large-v3-turbo. I tried to build the decoder and encoder .engine files myself using the instructions from the official TensorRT-LLM repository. However, when deploying the model in RIVA, I get errors like:
[07/22/2025-10:25:24] [TRT-LLM] [E] Found tensor names: ['input_ids', 'position_ids',
'encoder_input_lengths', 'encoder_max_input_length', 'encoder_output', 'host_past_key_value_lengths',
'host_context_lengths', 'sequence_length', 'context_lengths', 'host_request_types',
'host_runtime_perf_knobs', 'host_context_progress', 'last_token_ids', 'cross_attention_mask',
'cross_attention_packed_mask', 'cache_indirection', 'host_max_attention_window_sizes',
'host_sink_token_length', 'kv_cache_block_offsets', 'host_kv_cache_block_offsets',
'host_kv_cache_pool_pointers', 'host_kv_cache_pool_mapping', 'cross_kv_cache_block_offsets',
'host_cross_kv_cache_block_offsets', 'host_cross_kv_cache_pool_pointers',
'host_cross_kv_cache_pool_mapping', 'cross_kv_cache_gen', 'logits']
It looks like RIVA uses custom configurations when building the decoder. Are there currently any instructions on how to deploy a fine-tuned OpenAI or HuggingFace Whisper models in the RIVA pipeline? Or is RIVA’s team planning to add such support with the new version of RIVA?