Unfortunately this is not possible. Even if I share the file, you will not be able to test. I am using a custom Conformer-CTC model, for a Non-English language, which I can not share. The model was built using nemo:1.8.2, following the standard training procedures, and later converted first to riva and then to rmir and model following the steps described in the Riva 2.2.0 documentation. I have no problems running the model using nemo alone. I have issues only when I deploy it to Riva. I have tried with nn.use_onnx_runtime, as well as converting to trt, which by the way takes an awful long time to convert to model (~34mins trt vs ~2min onnx). I am running Riva on a T4 GPU. I have tried disabling vad, and I am planning to try with a greedy decoder to exclude any potential issue caused by using a language model and lexicon. Regardless of the approach, I always get final results where the first word in the transcript has an invalid, constant, timestamp, e.g.
The value is slightly different depending if I run on Riva 2.0.0 or 2.2.0, but it is always the first word a transcript. Either the first in the entire transcript of the file or mid file immediately following a previous vad segment (is_final).
Update; conversion to rmir with a greedy decoder succeeds with no errors, so does deploying to Riva, i.e. the model loads successfully, but when I try to transcribe anything I get no transcripts back. I am running examples/transcribe_file_verbose.py with additional printouts of every response, but in the case of a greedy decoder there are simply none.
The build took almost 20 minutes to convert the model to TRT plan. We tried the service-maker and riva-server from 2.2.0 versions. Still most of the time the transcripts get the weird start_time (start_time is constant and much higher than end_time) and the confidence value is always 1.0
I have uploaded an example audio file. Here in my case the invalid timestamp arises after a final, which finishes at 21240ms, and the next final whose first word ends at 22400ms. This word has an invalid timestamp of 1302720. All interim results prior to this final had this same startTime timestamp for this word. There are of course other examples, but I’m not listing them.
I have tried transcribing with the en-US conformer-CTC, but there I see no words, where starTime >> endTime.
Thank you so much for sharing the audio file, we really appreciate it,
I will share this audio file with the team and provide updates on the issue soon
@rvinobha Unfortunately in my case this did not help. I have tested with pytorch:22.06-py3+nemo:1.11.0rc0 and riva 2.3.0. I built the model rmir with the following command: