Local deployment error of SpeechSquad sample

follow the steps of SpeechSquad — NVIDIA Jarvis Speech Skills v1.3.0-beta documentation, but got error from server side:

I0508 13:36:26.292239 1 resources.cc:62] jarvis asr connection established to
I0508 13:36:26.292706 1 resources.cc:63] jarvis nlp connection established to
I0508 13:36:26.292716 1 resources.cc:64] jarvis tts connection established to
I0508 13:36:26.315240 1 server.cc:102] grpc server and event loop initialized and accepting connections
E0508 13:38:21.191830 9 context.cc:172] asr error detected - issuing cancellation on squad stream
E0508 13:38:21.195380 10 context.cc:172] asr error detected - issuing cancellation on squad stream
E0508 13:38:21.195479 13 context.cc:172] asr error detected - issuing cancellation on squad stream
E0508 13:38:21.195513 11 context.cc:172] asr error detected - issuing cancellation on squad stream
E0508 13:38:21.195511 12 context.cc:172] asr error detected - issuing cancellation on squad stream

what’s the reason?

Hi @jackhe
Could you please share the log files and system info so we can help better?


Hey @SunilJB. I get the same error that @Jackhe saw following those steps. I was trying to get some idea of jarvis scaling using a Tesla T4. I saw in the notes speech squad is not able to get latency info with the jarvis server. I will keep posted to see when this is supported.

Do you know where I could get some ranges on the number of conversations (2 audio channels) I can expect using just ASR or ASR/NLP or ASR/NLP/TTS without web framework constraints. The plan is to only use native python probably using aiortc to support webrtc audio sources.

I ran nvidia-bug-report.sh and a text file with server and client output
nvidia-bug-report.log.gz (744.5 KB)
speechsquad_T4.txt (3.6 KB)
. If there is something else to gather data let me know. So far Jarvis is great!


Hi @rmcinnis1 ,

It seems speedsquad by default process “quartznet-asr-trt-ensemble-vad-streaming”.

Could you please check if jarvis server has quartznet model running?
Else you can use something like-asr_model_name jasper-asr-trt-ensemble-vad-streaming argument along with other arguments (-tts_service_url) to point to Jasper ASR model.


1 Like

Thanks! @SunilJB. I checked config.sh for the speech server and quartznet was commented out so I included the option you provided and now I get results including latency. This might help me get data points on scale for the project. I have a lot of learning to do though.
ubuntu@ip-172-31-27-91:~/nvidia/speechsquad$ sudo docker run -it --net=host -v $(pwd)/speechsquad_sample_public_v1:/work/test_files/speech_squad/ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 speechsquad_perf_client --squad_questions_json=/work/test_files/speech_squad/recorded_questions.jl --squad_dataset_json=/work/test_files/speech_squad/manifest.json --speech_squad_uri= --chunk_duration_ms=800 --executor_count=1 --num_iterations=1 --num_parallel_requests=64 --print_results=false
Loading eval dataset…
Done loading 5 files for process 0
Generating load…
…Waiting for all responses…

Done with measurements
Generating Statistics Report…
================ Process 0================

tracing.speech_squad.asr_latency (ms):
Median 90th 95th 99th Avg
250.15 350.82 350.82 350.82 224.57

tracing.speech_squad.nlp_latency (ms):
Median 90th 95th 99th Avg
9.427 114.33 114.33 114.33 31.705

tracing.speech_squad.tts_latency (ms):
Median 90th 95th 99th Avg
128.33 291.36 291.36 291.36 150.93

Client Latency (ms):
Median 90th 95th 99th Avg
439.66 515.62 515.62 515.62 408.55
================ Final Report ================
Run time: 4.8552 sec.
Total audio processed: 17.811 sec.
Throughput: 3.6683 RTFX
Number of failed audio clips: 0
Average Latencies ====>
Client Latency:408.55 ms
tracing.server_latency.natural_query:0 ms
tracing.server_latency.speech_synthesis:0 ms
tracing.server_latency.streaming_recognition:0 ms
tracing.speech_squad.asr_latency:224.57 ms
tracing.speech_squad.nlp_latency:31.705 ms
tracing.speech_squad.tts_latency:150.93 ms

1 Like

Hi @SunilJB ,
I am running speech squad on the latest v1.2.1-beta version and I believe the ASR model is jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming in config.sh.
I tried giving --asr_model_name argument while running the container but it did not affect it and I am getting the error mentioned above which is:

asr error detected - issuing cancellation on squad stream

Can you please tell us how to run speech squad on the new citrix asr model since we will be using this model and not jasper.

P.S In the documentation of new jarvis version for speech squad (SpeechSquad — NVIDIA Jarvis Speech Skills v1.2.1-beta documentation) the first step says:
docker pull nvcr.io/nvidia/speech_squad:1.0.0-b.1

But it should be :
docker pull nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1

The /jarvis part is missing and if not added it will give error saying :
Error response from daemon: pull access denied for nvcr.io/nvidia/speech_squad, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.

Also the same URL with /jarvis missing is mentioned for all the sample applications.


Hi @shilpa.suresh
Could you please try above suggest solution and check if it resolves the issues?
In case issues persist could you please share the log so we can help better?


Hi @SunilJB,
What should be the model name for citrinet? Because I want to test the latency on the new citrinet which is the default for the new version.

Hi @shilpa.suresh
Model to be used for ASR can be passed using above argument.
During the jarvis_init process, the JMIR files in $jarvis_model_loc/jmir
are inspected and optimized for deployment. The optimized versions are
stored in $jarvis_model_loc/models. The jarvis server exclusively uses these optimized versions.


Hi @SunilJB ,
Since I wanted to test the citrinet I ran the following command:

docker run -it --rm --net=host nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 speechsquad_server -tts_service_url= -nlp_service_url= -asr_service_url= -asr_model_name=citrinet-1024-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming

As you mentioned I took the model name from the $jarvis_model_loc/models folder. Yet it didn’t affect the error.

I am attaching both speechsquad server and client logs.

speech_squad_client_log.txt (2.4 KB)
speech_squad_server_log.txt (1.8 KB)

Hi @shilpa.suresh
Could you please try citrinet-1024-asr-trt-ensemble-vad-streaming as asr_model_name?


Hi @SunilJB ,
I tried citrinet-1024-asr-trt-ensemble-vad-streaming as asr_model_name and I got the speech squad results. Thank you so much for your help. Truly appreciate it.

Thanks and Regards,

1 Like