Production Inference Path for Fine-Tuned Canary-v2 (TensorRT or RIVA Support)

gabriel.pimenta · February 6, 2026, 5:39pm

Dear all,

We are Dharma-AI, an AI startup, that currently developing an automatic speech-to-text transcription product, in which we are fine-tuning NVIDIA’s Canary-v2 model for Portuguese, with a specific focus on the vocabulary and terminology of the Brazilian legal domain. The fine-tuning processes are already underway, and we are finalizing a first version of the model to begin our inference testing pipeline in a production-like environment.

For this stage, we would like to leverage an NVIDIA inference SDK, such as TensorRT or RIVA, aiming to ensure performance, scalability, and alignment with the NVIDIA ecosystem. However, throughout our research and practical experimentation, we have encountered the following technical limitations:

TensorRT: as far as we have been able to verify, there is currently no direct support for running speech-to-text models such as Canary-v2 in the .nemo format. To use it with TensorRT, it would be necessary to convert the model to ONNX; however, so far, we have not found official support for exporting fine-tuned .nemo models to .onnx.

RIVA: we understand that RIVA provides more native support for ASR models, but it requires the model to be in the .riva format. Although there is the nemo2riva tool for converting .nemo to .riva, our tests indicate that this tool does not support the Canary-v2 architecture, making this path infeasible at the moment.

Given this scenario, we would appreciate clarification from the community and/or the NVIDIA technical team on the following points:

Is there currently (or planned in the near future) an official and supported path to run fine-tuned Canary-v2 models in production using TensorRT or RIVA?
If not, what would be the best practice recommended by NVIDIA for deploying this type of model in production?

We thank you in advance for any guidance or technical references that could help us in this deployment workflow.

Topic		Replies	Views
Deployment of finetuned Canary-1B model Riva	2	99	January 30, 2026
Deploying NVIDIA Riva Multilingual ASR with Whisper and Canary Architectures While Selectively Deactivating NMT Technical Blog	1	118	February 20, 2025
Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva: No Transcriptions Returned Riva	0	109	December 1, 2024
Riva 1.8.0 deploy pretrained Tacotron2+Waveglow TTS Riva tensorrt , nemo , riva	2	877	February 9, 2022
Speech-to-text-deployment notebook Riva	1	796	December 22, 2021
New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model Technical Blog	2	369	August 8, 2024
NVIDIA Speech AI Models Deliver Industry-Leading Accuracy and Performance Technical Blog	1	101	June 4, 2025
Speech Recognition: Deploying Models to Production Technical Blog	0	415	November 9, 2021
Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh Riva inception	3	847	August 12, 2022
Doc is missing for Nvidia TAO Conversation AI TAO Toolkit tao	6	434	July 25, 2023

Production Inference Path for Fine-Tuned Canary-v2 (TensorRT or RIVA Support)

Related topics