GPU - RTX 2080 Ti
Operating System: Ubuntu 18.04
Riva Version: 2.7.0
How to reproduce the issue ?
Hi! I have NeMo model (citrinet 512) and I followed Riva Overview to deploy my model to Riva, steps I did :
- start servicemaker
- nemo2riva
- riva-build
riva-build speech_recognition \
/servicemaker-dev/asr_online_beam_model_experiment.rmir:tlt_encode \
/servicemaker-dev/asr_0_1.riva:tlt_encode \
--streaming=True \
--name=citrinet-en-US-asr-streaming \
--decoder_type=flashlight \
--decoding_language_model_binary=/servicemaker-dev/lm.binary \
--decoding_vocab=/servicemaker-dev/vocab.txt \
--language_code=en-US \
--nn.fp16_needs_obey_precision_pass
- deploy with riva_init.sh
- riva_start.sh
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...
for offline inference i can get good transcript
but if i try to do online inference i only get empty transcript
riva_streaming_asr_client --audio_file httpswwwyoutubecomwatchvarOnNqlWGE8_chunk001.wav --language_code=en-US
I1202 01:40:22.248736 129 riva_streaming_asr_client.cc:154] Using Insecure Server Credentials
Loading eval dataset...
filename: /opt/riva/httpswwwyoutubecomwatchvarOnNqlWGE8_chunk001.wav
Done loading 1 files
-----------------------------------------------------------
File: /opt/riva/httpswwwyoutubecomwatchvarOnNqlWGE8_chunk001.wav
Final transcripts:
Audio processed: -7.37696e+37 sec.
-----------------------------------------------------------
Not printing latency statistics because the client is run without the --simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with --simulate_realtime and set the --chunk_duration_ms to be the same as the server chunk duration
Run time: 0.156079 sec.
Total audio processed: 6.421 sec.
Throughput: 41.1393 RTFX
and this is from the server
I1202 01:40:22.251299 189 grpc_riva_asr.cc:935] ASRService.StreamingRecognize called.
I1202 01:40:22.251327 189 grpc_riva_asr.cc:962] ASRService.StreamingRecognize performing streaming recognition with sequence id: 1375652540
I1202 01:40:22.251346 189 grpc_riva_asr.cc:1019] Using model citrinet-en-US-asr-streaming for inference
I1202 01:40:22.251386 189 grpc_riva_asr.cc:1035] Model sample rate= 16000 for inference
I1202 01:40:22.251515 189 riva_asr_stream.cc:214] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
I1202 01:40:22.406829 189 grpc_riva_asr.cc:1136] ASRService.StreamingRecognize returning OK
I already tried to use --nn.use_trt_fp32, but i got error because the layer uses fp16.
Error Code 4: Internal Error (fp16 precision has been set for a layer or layer output, but fp16 is not configured in the builder)
[12/02/2022-03:37:22] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
so i removed --nn.use_trt_fp32, my model is deployed and the server starts successfully, but i get empty transcript. How to solve this empty transcript problem?