Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh

elham1 · June 29, 2022, 7:05am

I fine-tuned “speechtotext_english_citrinet_1024.tlt” model using TAO toolkit for en-us language code, then exported it as “asr-model.riva” in “RIVA” format, again by using TAO toolkit.

[NeMo I 2022-06-28 13:04:47 export:75] Experiment logs saved to '/results'
[NeMo I 2022-06-28 13:04:47 export:76] Exported model to '/results/asr-model.riva'
[NeMo I 2022-06-28 13:04:48 export:83] Exported model is compliant with Riva

For deploying our model, i used speech-to-text-deployment.ipynb instructions. For converting .riva to .rmir :

!docker run --rm --gpus all -v $MODEL_LOC:/data nvcr.io/nvidia/riva/riva-speech:2.2.1-servicemaker -- \
            riva-build speech_recognition -f /data/asr.rmir:$KEY /data/asr-model.riva:$KEY --offline \
            --decoder_type=greedy

2022-06-28 15:44:06,590 [WARNING] Property 'binary' is deprecated. Please use the callback system instead.
2022-06-28 15:44:10,781 [INFO] Packing binaries for nn/ONNX : {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')}
2022-06-28 15:44:10,781 [INFO] Copying onnx:model_graph.onnx -> nn:nn-model_graph.onnx
2022-06-28 15:44:23,547 [INFO] Packing binaries for lm_decoder/ONNX : {'vocab_file': '/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'tokenizer.model')}
2022-06-28 15:44:23,547 [INFO] Copying vocab_file:/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt -> lm_decoder:lm_decoder-riva_decoder_vocabulary.txt
2022-06-28 15:44:23,547 [INFO] Copying tokenizer_model:tokenizer.model -> lm_decoder:lm_decoder-tokenizer.model
2022-06-28 15:44:23,548 [INFO] Packing binaries for rescorer/ONNX : {'vocab_file': '/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt'}
2022-06-28 15:44:23,548 [INFO] Copying vocab_file:/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt -> rescorer:rescorer-riva_decoder_vocabulary.txt
2022-06-28 15:44:23,549 [INFO] Packing binaries for vad/ONNX : {'vocab_file': '/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt'}
2022-06-28 15:44:23,549 [INFO] Copying vocab_file:/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt -> vad:vad-riva_decoder_vocabulary.txt
2022-06-28 15:44:23,549 [INFO] Saving to /data/asr.rmir

Then for deploying it, at first I changed config.sh of riva_quickstart_v2.2.1 by setting $riva_model_loc parameter to the defined $MODEL_LOC path and $use_existing_rmirs flag to true. Also, I copied manually asr.rmir to $riva_model_loc/rmir.
After that, I run riva_init.sh to deploy my asr.rmir model:

2022-06-28 15:47:17,434 [INFO] Writing Riva model repository to '/data/models'...
2022-06-28 15:47:17,434 [INFO] The riva model repo target directory is /data/models
2022-06-28 15:47:24,667 [INFO] Using tensorrt with fp16
2022-06-28 15:47:24,667 [INFO] Extract_binaries for nn -> /data/models/riva-trt-riva-asr-am-streaming-offline/1
2022-06-28 15:47:24,667 [INFO] extracting {'onnx': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'model_graph.onnx')} -> /data/models/riva-trt-riva-asr-am-streaming-offline/1
2022-06-28 15:47:34,475 [INFO] Printing copied artifacts:
2022-06-28 15:47:34,475 [INFO] {'onnx': '/data/models/riva-trt-riva-asr-am-streaming-offline/1/model_graph.onnx'}
2022-06-28 15:47:34,476 [INFO] Building TRT engine from ONNX file
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
[libprotobuf WARNING /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 564516907
[06/28/2022-15:47:41] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[06/28/2022-15:48:24] [TRT] [W] Output type must be INT32 for shape outputs
[06/28/2022-15:48:24] [TRT] [W] Output type must be INT32 for shape outputs
.
.
.
[06/28/2022-15:48:37] [TRT] [W] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[06/28/2022-15:48:37] [TRT] [W]  (# 0 (SHAPE audio_signal))
[06/28/2022-15:48:37] [TRT] [W]  (# 0 (SHAPE length))
[06/28/2022-15:48:47] [TRT] [W] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[06/28/2022-15:48:47] [TRT] [W]  (# 0 (SHAPE audio_signal))
[06/28/2022-15:48:47] [TRT] [W]  (# 0 (SHAPE length))
.
.
.
2022-06-28 16:04:46,759 [INFO] Writing engine to model repository: /data/models/riva-trt-riva-asr-am-streaming-offline/1/model.plan
2022-06-28 16:04:47,076 [INFO] Extract_binaries for featurizer -> /data/models/riva-asr-feature-extractor-streaming-offline/1
2022-06-28 16:04:47,078 [INFO] Extract_binaries for vad -> /data/models/riva-asr-voice-activity-detector-ctc-streaming-offline/1
2022-06-28 16:04:47,078 [INFO] extracting {'vocab_file': '/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt'} -> /data/models/riva-asr-voice-activity-detector-ctc-streaming-offline/1
2022-06-28 16:04:47,079 [INFO] Extract_binaries for lm_decoder -> /data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1
2022-06-28 16:04:47,080 [INFO] extracting {'vocab_file': '/tmp/tmpno0x5ykt/riva_decoder_vocabulary.txt', 'tokenizer_model': ('nemo.collections.asr.models.ctc_bpe_models.EncDecCTCModelBPE', 'tokenizer.model')} -> /data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1
2022-06-28 16:04:47,080 [INFO] {'vocab_file': '/data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1/riva_decoder_vocabulary.txt', 'tokenizer_model': '/data/models/riva-asr-ctc-decoder-cpu-streaming-offline/1/tokenizer.model'}
2022-06-28 16:04:47,081 [INFO] Extract_binaries for riva-asr -> /data/models/riva-asr/1
+ [[ amd64 == \a\r\m\6\4 ]]
+ echo

+ echo 'Riva initialization complete. Run ./riva_start.sh to launch services.'
Riva initialization complete. Run ./riva_start.sh to launch services.

As can be seen, our model is around 570 MB, this process takes 17~18 minutes and it has been done without any error. BUT the deployed models in data/models:

total 20K
drwxr-xr-x 3 root root 4.0K Jun 28 16:04 riva-asr
drwxr-xr-x 3 root root 4.0K Jun 28 16:04 riva-asr-ctc-decoder-cpu-streaming-offline
drwxr-xr-x 3 root root 4.0K Jun 28 16:04 riva-asr-feature-extractor-streaming-offline
drwxr-xr-x 3 root root 4.0K Jun 28 16:04 riva-asr-voice-activity-detector-ctc-streaming-offline
drwxr-xr-x 3 root root 4.0K Jun 28 16:04 riva-trt-riva-asr-am-streaming-offline

are almost empty.

By runing riva_start.init i got:

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...

which shows that all deployed models are locaded on the Riva server. BUT when i want to send a request to this server for transcribing below audio file by running this code:

audio_file = "example.wav"
server = "localhost:50051"

wf = wave.open(audio_file, 'rb')
with open(audio_file, 'rb') as fh:
    data = fh.read()

channel = grpc.insecure_channel(server)
client = rasr_srv.RivaSpeechRecognitionStub(channel)
config = rasr.RecognitionConfig(
    sample_rate_hertz=16000,
    language_code="en-US",
    max_alternatives=5,
    enable_automatic_punctuation=False,
    audio_channel_count=1,
)
request = rasr.RecognizeRequest(config=config, audio=data)
response = client.Recognize(request)
print(response)

it seems that the loaded models don’t run properly. In fact, audio files are often not transcribed or are just a word or two long:

results {
  alternatives {
    transcript: "no "
    confidence: 1.0
  }
  channel_tag: 1
  audio_processed: 4.800000190734863
}

BUT when I test the fine-tuned model on this audio file using TAO i got:

!tao speech_to_text_citrinet infer \
     -e $SPECS_DIR/speech_to_text_citrinet/infer.yaml \
     -g 1 \
     -k $KEY \
     -m $RESULTS_DIR/citrinet/finetune/checkpoints/finetuned-model.tlt \
     -r $RESULTS_DIR/citrinet/infer \
     file_paths=[$DATA_DIR/example.wav]

Test config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 32
    shuffle: false
    use_start_end_token: false
    
[NeMo I 2022-06-28 08:44:17 features:255] PADDING: 16
[NeMo I 2022-06-28 08:44:17 features:272] STFT using torch
Transcribing: 100%|███████████████████████████████| 1/1 [00:01<00:00,  1.59s/it]
[NeMo I 2022-06-28 08:44:31 infer:69] The prediction results:
[NeMo I 2022-06-28 08:44:31 infer:71] File: /data/example.wav
[NeMo I 2022-06-28 08:44:31 infer:72] Predicted transcript: tms naers and throat were clear
[NeMo I 2022-06-28 08:44:31 infer:75] Experiment logs saved to '/results/citrinet/infer'
2022-06-28 08:44:33,734 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

that shows my model works correctly but it’s not properly deployed on the Riva server.

I need to know why this happened and what I should do to fix this.
Hardware: GPU T4
Ubuntu: 22.04
Riva Version: 2.2.0

rvinobha · July 13, 2022, 12:54pm

Hi @elham1

Thanks for your interest in Riva

Apologies for the delay

I will check regarding the above behavior further with the team and update,

Thanks for your patience

system · August 2, 2022, 3:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

rvinobha · August 12, 2022, 12:07pm

HI @elham1

Apologies for the delay,

Will it be possible to get the

infer.yaml
finetuned-model.tlt along with $KEY(inside a file) (e.g shared via google drive link)

Thanks

Topic		Replies	Views
Not able to run LM fine tuned qurtznet model Riva riva	13	1266	October 8, 2021
Riva Build fails for finetuned conformer NeMo models with batch size 1 Riva	2	750	November 1, 2022
Recreate QuickStart Stock Citrinet Model with Modified Parameters Riva	14	1714	August 4, 2022
RIVA v2.15.0 fails to build NeMo model Riva	0	395	March 30, 2024
RIVA error, when deploying official Conformer ASR network Riva riva	10	1947	January 27, 2023
Riva model deployment issue Riva inception	8	1561	April 4, 2024
Error in riva deployment Riva deployment aborted Riva ubuntu , nemo , riva	3	1109	February 27, 2023
Init. Jarvis with german model Riva riva	9	1466	November 4, 2021
Failed to get riva started Riva riva	7	1725	December 3, 2022
Help with custom deploy and perform inference using citrinet-mandarin NGC pre-trained model in Riva Riva riva	6	1123	October 12, 2021

Wrong outputs from our fine-tuned version of speechtotext_english_citrinet_1024.tlt after deploying using riva_init.sh

Related topics