Hey @SunilJB. I get the same error that @Jackhe saw following those steps. I was trying to get some idea of jarvis scaling using a Tesla T4. I saw in the notes speech squad is not able to get latency info with the jarvis server. I will keep posted to see when this is supported.
Do you know where I could get some ranges on the number of conversations (2 audio channels) I can expect using just ASR or ASR/NLP or ASR/NLP/TTS without web framework constraints. The plan is to only use native python probably using aiortc to support webrtc audio sources.
I ran nvidia-bug-report.sh and a text file with server and client output nvidia-bug-report.log.gz (744.5 KB) speechsquad_T4.txt (3.6 KB)
. If there is something else to gather data let me know. So far Jarvis is great!
It seems speedsquad by default process “quartznet-asr-trt-ensemble-vad-streaming”.
Could you please check if jarvis server has quartznet model running?
Else you can use something like-asr_model_name jasper-asr-trt-ensemble-vad-streaming argument along with other arguments (-tts_service_url) to point to Jasper ASR model.
Thanks! @SunilJB. I checked config.sh for the speech server and quartznet was commented out so I included the option you provided and now I get results including latency. This might help me get data points on scale for the project. I have a lot of learning to do though.
ubuntu@ip-172-31-27-91:~/nvidia/speechsquad$ sudo docker run -it --net=host -v $(pwd)/speechsquad_sample_public_v1:/work/test_files/speech_squad/ nvcr.io/nvidia/jarvis/speech_squad:1.0.0-b.1 speechsquad_perf_client --squad_questions_json=/work/test_files/speech_squad/recorded_questions.jl --squad_dataset_json=/work/test_files/speech_squad/manifest.json --speech_squad_uri=0.0.0.0:1337 --chunk_duration_ms=800 --executor_count=1 --num_iterations=1 --num_parallel_requests=64 --print_results=false
Loading eval dataset…
Done loading 5 files for process 0
Generating load…
…Waiting for all responses…
Done with measurements
Generating Statistics Report…
================ Process 0================
Client Latency (ms):
Median 90th 95th 99th Avg
439.66 515.62 515.62 515.62 408.55
================ Final Report ================
Run time: 4.8552 sec.
Total audio processed: 17.811 sec.
Throughput: 3.6683 RTFX
Number of failed audio clips: 0
Average Latencies ====>
Client Latency:408.55 ms
tracing.server_latency.natural_query:0 ms
tracing.server_latency.speech_synthesis:0 ms
tracing.server_latency.streaming_recognition:0 ms
tracing.speech_squad.asr_latency:224.57 ms
tracing.speech_squad.nlp_latency:31.705 ms
tracing.speech_squad.tts_latency:150.93 ms
Hi @SunilJB ,
I am running speech squad on the latest v1.2.1-beta version and I believe the ASR model is jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming in config.sh.
I tried giving --asr_model_name argument while running the container but it did not affect it and I am getting the error mentioned above which is:
asr error detected - issuing cancellation on squad stream
Can you please tell us how to run speech squad on the new citrix asr model since we will be using this model and not jasper.
The /jarvis part is missing and if not added it will give error saying :
Error response from daemon: pull access denied for nvcr.io/nvidia/speech_squad, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.
Also the same URL with /jarvis missing is mentioned for all the sample applications.
Hi @shilpa.suresh
Could you please try above suggest solution and check if it resolves the issues?
In case issues persist could you please share the log so we can help better?
Hi @SunilJB,
What should be the model name for citrinet? Because I want to test the latency on the new citrinet which is the default for the new version.
Hi @shilpa.suresh
Model to be used for ASR can be passed using above argument.
During the jarvis_init process, the JMIR files in $jarvis_model_loc/jmir
are inspected and optimized for deployment. The optimized versions are
stored in $jarvis_model_loc/models. The jarvis server exclusively uses these optimized versions.
Hi @SunilJB ,
I tried citrinet-1024-asr-trt-ensemble-vad-streaming as asr_model_name and I got the speech squad results. Thank you so much for your help. Truly appreciate it.