Dear team,
I am trying to deploy mandarin-citrinet ASR model in Riva. I understand that non-english ASR models are not present by default in Riva and thus want to check how to deploy custom ASR models in Riva.
I followed the approach as presented in the documentation ie:
- Converted the NGC pre-trained nemo model to riva format using nemo2riva script.
- Since, this is mandarin language, followed these instructions to build the riva format to rmir.
- Followed the custom model deployment instructions provided here (creating custom directory and providing the necessary location and changes in the config file etc).
I can see that the “riva-speech” server gets started successfully and Triton Inference Server starts and I can see that GPU is being utilised.
But, when I am trying to perform inference using the inference script example as provided in “Riva_speech_API_demo”, I am getting empty dictionary as response transcripts . The output of response.result
is :
[channel_tag: 1
]
The riva-speech container logs are provided below. You can observe that there are two error lines Registration of 'mandarin-citrinet-offline' failed with unknown service type
which could be an issue.
Please can anybody help in debugging this issue? Also, commenting on what other problems there might be and the steps to deploy custom non-english nemo model in riva at this time.
==========================
=== Riva Speech Skills ===
==========================
NVIDIA Release 21.07 (build 25292380)
Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
NOTE: Legacy NVIDIA Driver detected. Compatibility mode ENABLED.
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:20.173826 70 metrics.cc:228] Collecting metrics for GPU 0: Tesla T4
I0818 13:12:20.178053 70 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0818 13:12:20.178083 70 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0818 13:12:20.178088 70 onnxruntime.cc:1738] 'onnxruntime' TRITONBACKEND API version: 1.0
I0818 13:12:20.468899 70 pinned_memory_manager.cc:206] Pinned memory pool is created at '0x7f83f4000000' with size 268435456
I0818 13:12:20.469326 70 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 1000000000
I0818 13:12:20.477179 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-offline-feature-extractor-streaming-offline:1
I0818 13:12:20.577498 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline:1
I0818 13:12:20.578012 70 custom_backend.cc:201] Creating instance mandarin-citrinet-offline-feature-extractor-streaming-offline_0_0_gpu0 on GPU 0 (7.5) using libtriton_riva_asr_features.so
I0818 13:12:20.677806 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline:1
I0818 13:12:20.678146 70 custom_backend.cc:198] Creating instance mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0818 13:12:20.778145 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-stream-ctc-decoder-cpu-streaming:1
I0818 13:12:20.778462 70 custom_backend.cc:198] Creating instance mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline_0_0_cpu on CPU using libtriton_riva_asr_vad.so
I0818 13:12:20.878551 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-stream-feature-extractor-streaming:1
I0818 13:12:20.878861 70 custom_backend.cc:198] Creating instance mandarin-citrinet-stream-ctc-decoder-cpu-streaming_0_0_cpu on CPU using libtriton_riva_asr_decoder_cpu.so
W:parameter_parser.cc:106: Parameter forerunner_start_offset_ms could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
W:parameter_parser.cc:106: Parameter voc_string could not be set from parameters
W:parameter_parser.cc:107: Default value will be used
I0818 13:12:20.970309 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline' version 1
I0818 13:12:20.978910 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-stream-voice-activity-detector-ctc-streaming:1
I0818 13:12:20.979565 70 custom_backend.cc:201] Creating instance mandarin-citrinet-stream-feature-extractor-streaming_0_0_gpu0 on GPU 0 (7.5) using libtriton_riva_asr_features.so
I0818 13:12:21.079298 70 model_repository_manager.cc:1066] loading: riva-trt-mandarin-citrinet-offline-am-streaming-offline:1
I0818 13:12:21.079645 70 custom_backend.cc:198] Creating instance mandarin-citrinet-stream-voice-activity-detector-ctc-streaming_0_0_cpu on CPU using libtriton_riva_asr_vad.so
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:21.176807 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-stream-ctc-decoder-cpu-streaming' version 1
I0818 13:12:21.178420 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline' version 1
I0818 13:12:21.179609 70 model_repository_manager.cc:1066] loading: riva-trt-mandarin-citrinet-stream-am-streaming:1
I0818 13:12:21.458073 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-stream-voice-activity-detector-ctc-streaming' version 1
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:31.170981 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-stream-feature-extractor-streaming' version 1
I0818 13:12:31.171033 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-offline-feature-extractor-streaming-offline' version 1
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:37.718432 70 plan_backend.cc:384] Creating instance riva-trt-mandarin-citrinet-offline-am-streaming-offline_0_0_gpu0 on GPU 0 (7.5) using model.plan
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:39.477380 70 plan_backend.cc:768] Created instance riva-trt-mandarin-citrinet-offline-am-streaming-offline_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0818 13:12:39.486597 70 model_repository_manager.cc:1240] successfully loaded 'riva-trt-mandarin-citrinet-offline-am-streaming-offline' version 1
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:41.271908 70 plan_backend.cc:384] Creating instance riva-trt-mandarin-citrinet-stream-am-streaming_0_0_gpu0 on GPU 0 (7.5) using model.plan
> Riva waiting for Triton server to load all models...retrying in 1 second
> Riva waiting for Triton server to load all models...retrying in 1 second
I0818 13:12:43.092757 70 plan_backend.cc:768] Created instance riva-trt-mandarin-citrinet-stream-am-streaming_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0818 13:12:43.102851 70 model_repository_manager.cc:1240] successfully loaded 'riva-trt-mandarin-citrinet-stream-am-streaming' version 1
I0818 13:12:43.103639 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-offline:1
I0818 13:12:43.203978 70 model_repository_manager.cc:1066] loading: mandarin-citrinet-stream:1
I0818 13:12:43.304271 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-offline' version 1
I0818 13:12:43.304522 70 model_repository_manager.cc:1240] successfully loaded 'mandarin-citrinet-stream' version 1
I0818 13:12:43.304640 70 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0818 13:12:43.304694 70 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt | <built-in> | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0818 13:12:43.304800 70 server.cc:586]
+-------------------------------------------------------------------------+---------+--------+
| Model | Version | Status |
+-------------------------------------------------------------------------+---------+--------+
| mandarin-citrinet-offline | 1 | READY |
| mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline | 1 | READY |
| mandarin-citrinet-offline-feature-extractor-streaming-offline | 1 | READY |
| mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline | 1 | READY |
| mandarin-citrinet-stream | 1 | READY |
| mandarin-citrinet-stream-ctc-decoder-cpu-streaming | 1 | READY |
| mandarin-citrinet-stream-feature-extractor-streaming | 1 | READY |
| mandarin-citrinet-stream-voice-activity-detector-ctc-streaming | 1 | READY |
| riva-trt-mandarin-citrinet-offline-am-streaming-offline | 1 | READY |
| riva-trt-mandarin-citrinet-stream-am-streaming | 1 | READY |
+-------------------------------------------------------------------------+---------+--------+
I0818 13:12:43.304907 70 tritonserver.cc:1658]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.9.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /data/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 1000000000 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0818 13:12:43.306418 70 grpc_server.cc:4028] Started GRPCInferenceService at 0.0.0.0:8001
I0818 13:12:43.306699 70 http_server.cc:2761] Started HTTPService at 0.0.0.0:8000
I0818 13:12:43.348255 70 http_server.cc:2780] Started Metrics Service at 0.0.0.0:8002
> Triton server is ready...
I0818 13:12:43.363310 216 grpc_health.cc:27] RivaHealthService initialized with server: localhost:8001
I0818 13:12:43.363358 216 grpc_riva_asr.cc:148] Setting uri for ASRServiceImpl
I0818 13:12:43.363363 216 grpc_riva_asr.cc:149] Initializing different models
I0818 13:12:43.363832 216 model_registry.cc:36] RivaModelRegistry initialized with server: localhost:8001
I0818 13:12:43.364866 216 model_registry.cc:65] Server Name: triton, Server version: 2.9.0
I0818 13:12:43.365250 216 model_registry.cc:86] Our model repository has a total of: 10 models
I0818 13:12:43.365267 216 model_registry.cc:91] Model names: mandarin-citrinet-offline, Model version: 1
I0818 13:12:43.369211 216 model_registry.cc:104] 'Successfully registering mandarin-citrinet-offline'
I0818 13:12:43.369323 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline, Model version: 1
I0818 13:12:43.370424 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-feature-extractor-streaming-offline, Model version: 1
I0818 13:12:43.371452 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline, Model version: 1
I0818 13:12:43.372296 216 model_registry.cc:91] Model names: mandarin-citrinet-stream, Model version: 1
I0818 13:12:43.373432 216 model_registry.cc:104] 'Successfully registering mandarin-citrinet-stream'
I0818 13:12:43.373492 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-ctc-decoder-cpu-streaming, Model version: 1
I0818 13:12:43.374459 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-feature-extractor-streaming, Model version: 1
I0818 13:12:43.375448 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-voice-activity-detector-ctc-streaming, Model version: 1
I0818 13:12:43.376302 216 model_registry.cc:91] Model names: riva-trt-mandarin-citrinet-offline-am-streaming-offline, Model version: 1
I0818 13:12:43.377012 216 model_registry.cc:91] Model names: riva-trt-mandarin-citrinet-stream-am-streaming, Model version: 1
I0818 13:12:43.377703 216 model_registry.cc:109] Successfully registered: 2 models.
I0818 13:12:43.377724 216 client.cc:38] RivaLanguageUnderstandingClient initialized with server: localhost:8001
I0818 13:12:43.378015 216 client.cc:54] Our model repository has: 10 models.
W0818 13:12:43.379010 216 client.cc:78] Registration of 'mandarin-citrinet-offline' failed with unknown service type
W0818 13:12:43.382831 216 client.cc:78] Registration of 'mandarin-citrinet-stream' failed with unknown service type
I0818 13:12:43.387140 216 grpc_riva_asr.cc:173] Punctuation model does not exist on server
I0818 13:12:43.387164 216 grpc_riva_asr.cc:177] Seeding RNG used for correlation id with time: 1629292363
I0818 13:12:43.435235 216 grpc_riva_asr.cc:148] Setting uri for ASRServiceImpl
I0818 13:12:43.435253 216 grpc_riva_asr.cc:149] Initializing different models
I0818 13:12:43.435261 216 model_registry.cc:36] RivaModelRegistry initialized with server: localhost:8001
I0818 13:12:43.435756 216 model_registry.cc:65] Server Name: triton, Server version: 2.9.0
I0818 13:12:43.436023 216 model_registry.cc:86] Our model repository has a total of: 10 models
I0818 13:12:43.436043 216 model_registry.cc:91] Model names: mandarin-citrinet-offline, Model version: 1
I0818 13:12:43.437112 216 model_registry.cc:104] 'Successfully registering mandarin-citrinet-offline'
I0818 13:12:43.437167 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-ctc-decoder-cpu-streaming-offline, Model version: 1
I0818 13:12:43.438241 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-feature-extractor-streaming-offline, Model version: 1
I0818 13:12:43.439265 216 model_registry.cc:91] Model names: mandarin-citrinet-offline-voice-activity-detector-ctc-streaming-offline, Model version: 1
I0818 13:12:43.440058 216 model_registry.cc:91] Model names: mandarin-citrinet-stream, Model version: 1
I0818 13:12:43.441023 216 model_registry.cc:104] 'Successfully registering mandarin-citrinet-stream'
I0818 13:12:43.441071 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-ctc-decoder-cpu-streaming, Model version: 1
I0818 13:12:43.442020 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-feature-extractor-streaming, Model version: 1
I0818 13:12:43.443114 216 model_registry.cc:91] Model names: mandarin-citrinet-stream-voice-activity-detector-ctc-streaming, Model version: 1
I0818 13:12:43.443967 216 model_registry.cc:91] Model names: riva-trt-mandarin-citrinet-offline-am-streaming-offline, Model version: 1
I0818 13:12:43.444602 216 model_registry.cc:91] Model names: riva-trt-mandarin-citrinet-stream-am-streaming, Model version: 1
I0818 13:12:43.445252 216 model_registry.cc:109] Successfully registered: 2 models.
I0818 13:12:43.445274 216 client.cc:38] RivaLanguageUnderstandingClient initialized with server: localhost:8001
I0818 13:12:43.445533 216 client.cc:54] Our model repository has: 10 models.
W0818 13:12:43.446538 216 client.cc:78] Registration of 'mandarin-citrinet-offline' failed with unknown service type
W0818 13:12:43.450173 216 client.cc:78] Registration of 'mandarin-citrinet-stream' failed with unknown service type
I0818 13:12:43.454082 216 grpc_riva_asr.cc:173] Punctuation model does not exist on server
I0818 13:12:43.454099 216 grpc_riva_asr.cc:177] Seeding RNG used for correlation id with time: 1629292363
I0818 13:12:43.496172 216 riva_server.cc:93] ASR Service connected to Triton at localhost:8001
I0818 13:12:43.496188 216 riva_server.cc:96] Riva Conversational AI Server listening on 0.0.0.0:50051
I0818 13:14:09.441917 244 grpc_riva_asr.cc:398] ASRService.Recognize called.
I0818 13:14:09.442003 244 riva_asr_stream.cc:219] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
I0818 13:14:09.442009 244 grpc_riva_asr.cc:453] ASRService.Recognize performing streaming recognition with sequence id: 167864285
I0818 13:14:09.442050 244 grpc_riva_asr.cc:471] Using model mandarin-citrinet-offline for inference
I0818 13:14:09.442132 244 grpc_riva_asr.cc:486] Model sample rate= 16000 for inference
I0818 13:14:09.484850 244 grpc_riva_asr.cc:553] ASRService.Recognize returning OK