Jarvis Hindi model giving gibberish output

Hi, We are trying to deploy a Hindi Jarvis model for a live speech-to-text service [live streaming]. We have converted the Jarvis model from a Nemo Hindi model using the following steps:

AWS EC-2 : p2x.large.
GPU: Nvidia Tesla T4
Cuda version: 11.2
Nvidia driver version; 460.80

Note: In the second step, we have used 1.0.0-b.3 servicemaker because we are using Jarvis API of 1.0.0-b3 version.
Using the above steps we are able to convert the Nemo model to Jarvis models.
Conversion pipeline: Nemo → quartz.onnx → quartznet_asr.enemo → quartznet_asr.jmir → Jarvis models.
By giving the model location in the config file of jarvis api 1.0.0-b.3, we were able to start the model by running ./jarvis_start.sh. On testing with a sample hindi audio file, we are getting gibberish output.

Nemo model output and expected jarvis model output:

The current output of jarvis hindi model:

Hi @namanveer2000 ,
Can you please help us with the model and reproducible script so that we can try debugging it.

Thanks!

We have converted the Nemo model to ejrvs instead of enemo model using the nemo2jarvis pip package available in the jarvis_quickstart:1.3.0-beta resource. We went ahead with the remaining conversion process [ nemo->ejrvs->jmir->models ] and we used the latest servicemaker for the conversion [ jarvis-speech:1.3.0-beta-servicemaker ] and the gibberish output was resolved.

2 Likes

@AakankshaS Please reply to this query, Or please raise Nvidia Internal bug.