Voice Demo Container for Jetson Xavier NX not working

Cloud-Native Demo on Jetson
The first thing that I did was downloaded the new demo.

  1. Three containers are working as advertized.
  2. Voice demo container does not work.

Anything that I am missing?

I have checked my USB headset’s mic. It works. But the container seems not getting my voice that is my guess.

Thanks in advance.

Hi,
You may run arecord or aplay command to make sure the device is good first.
Reference commands:

$ aplay -Dhw:1,0  test_stereo_44100Hz_16bit_PCM.wav
$ arecord -D hw:1,0 -c2 -d 10 -r 44100 -f S16_LE a.wav

Thanks for your quick return!

I really like my new Xavier NX. The difference between Nano and NX is day and night.
My Nano is now silently supporting Jetbot without no further duties doing my additional works. I really want to see the demo on my Xavier NX.

First, my apology. I should have written what I did. Here comes what I did:

  1. I flashed my SD card to make sure it was in the flash state with NX image.
  2. arecord check with the wave file generated. It worked.
  3. Ran the 4 containers demo. All but the voice did not work.
  4. Stop the demo.
  5. Visit Voice Demo for Jetson/L4T | NVIDIA NGC
  6. Launching Triton container
  7. Launching client container
  8. run ./scripts/list_microphones.sh from the inside of the client container.
    The input device ID of my Mic is 24 as in the example.
  9. run “python3 src/chatbot.py --mic 24 --push-to-talk Space”
    Using the correct mic parameter. I see changes between mute and live when I hold the space bar.
  10. In case, I have tried without --push-to-talk Space. No change.

I hope I give you enough details.
Here is a summary

  1. I flashed my SD card, and made my SSD working as a main driver.
  2. The mic was working.
  3. The client container understands my mic as #24 device.
  4. The client pops up the GUI.
  5. No sound sampling display.

For your review, here is the output of the client window when I execute “python3 src/chatbot.py --mic 24 --push-to-talk Space”

You can see some of the errors.
[Errno -9997] Invalid sample rate is the issue. Here is the full output trace from the client window. Thanks again!

root@dave-desktop:/workspace# python3 src/chatbot.py --mic 24 --push-to-talk Space
/usr/local/lib/python3.6/dist-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of ‘jit’ requested from: ‘numba.decorators’, please update to use ‘numba.core.decorators’ or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
Namespace(asr_backend=‘jarvis-jasper’, gui=True, gui_height=540, gui_plot=‘mel_gl’, gui_width=960, mic=24, nlp_backend=‘trt-bert-qa’, nlp_model=‘base’, nlp_negative_answer=False, para=‘test/passages’, push_to_talk=‘Space’, url=‘localhost:8001’, verbose=False, wav=None)
GUI - root window focus in
GLX version: 1.4
Screen is 0
Number of FBconfigs 36
Got a matching visual: index 2 33 xid 0x21
Is Direct?: 1
Done making a first context
GUI - PlotGL initializing, width=960 height=175
Number of extensions: 381
GL_VENDOR : b’NVIDIA Corporation’
GL_RENDERER: b’NVIDIA Tegra Xavier (nvgpu)/integrated’
GL_VERSION : b’4.6.0 NVIDIA 32.4.2’
GL_MAJOR_VERSION: 4
GL_MINOR_VERSION: 6
GL_SHADING_LANGUAGE_VERSION : b’4.60 NVIDIA’
GL_CONTEXT_CORE_PROFILE_BIT : False
GL_CONTEXT_COMPATIBILITY_PROFILE_BIT : False
GUI - done window initialization
NLP - creating NLP backend: trt-bert-qa
NLP - loading BERT TensorRT engine: models/bert/tensorflow/bert_tf_v2_base_fp16_128_v2/bert_fp16.plan
[TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
NLP - loaded BERT TensorRT engine: models/bert/tensorflow/bert_tf_v2_base_fp16_128_v2/bert_fp16.plan
NLP - allocating CUDA memory for BERT…
NLP - done initialization of models/bert/tensorflow/bert_tf_v2_base_fp16_128_v2/bert_fp16.plan
NLP - BERT TensorRT model ready.
GUI - root window focus out
NLP - BERT loaded paragraph text for topic #1 (1_BERT)

BERT is a deep neural network used for natural language processing and understanding. BERT means Bidirectional Encoder Representations from Transformers. BERT is able to perform a variety of NLP tasks such as question answering, intent classification, sentiment analysis, paraphrasing, recommendations, and autocompletion. BERT question answering works by providing a source passage paragraph at runtime which BERT can then answer questions about by selecting the most relevant text to the query from the passage. The BERT model is accelerated on Jetson with GPU using TensorRT.

NLP - BERT loaded paragraph text for topic #2 (2_GTC)

GTC is the GPU Technology Conference. The keynote is held on Tuesday at 2pm in the San Jose State University. Our CEO Jensen Huang will be presenting the keynote. Shuttles will be provided for getting from the convention center to the keynote. The expo is found in hall one and two to your right. The expo is open between the hours of 9 and 5. The racetrack is found in the Jetson pavillion. Room 1 can be found in the lower level of the convention center. Room 2 can be found in the west wing of the convention center.

NLP - BERT loaded paragraph text for topic #3 (3_Jetson_Xavier_NX)

Jetson Xavier NX is a new edge computer from NVIDIA. It has 6 CPU cores, 384 GPU cores, 2 DLA engines, and 8 gigabytes of memory. The module is available for $399 dollars. Jetson Xavier NX is useful for deploying accelerated AI-powered machine learning and computer vision applications for AI at the edge. It runs the latest JetPack 4.4 software, which includes support for Linux, the NVIDIA CUDA toolkit, cuDNN, TensorRT, and Docker containers.

NLP - BERT loaded paragraph text for topic #4 (4_Cloud_Native)

Cloud native is a term used to describe container-based environments. Cloud-native technologies are used to develop applications built with services packaged in containers, deployed as microservices and managed on elastic infrastructure through agile DevOps processes and continuous delivery workflows. Containers offer both efficiency and speed compared with standard virtual machines. Jetson supports cloud native development through the use of Docker containers and pre-trained deep learning models that we have available on NVIDIA GPU Cloud.

NLP - BERT loaded paragraph text for topic #5 (5_Football)

The 2019 NFL season was the 100th season of the National Football League (NFL). The season began on September 5, 2019. The Kansas City Chiefs won the Super Bowl against the San Francisco 49ers on February 2, 2020, at Hard Rock Stadium in Miami, Florida.

NLP - set topic to #1 (1_BERT)

BERT is a deep neural network used for natural language processing and understanding. BERT means Bidirectional Encoder Representations from Transformers. BERT is able to perform a variety of NLP tasks such as question answering, intent classification, sentiment analysis, paraphrasing, recommendations, and autocompletion. BERT question answering works by providing a source passage paragraph at runtime which BERT can then answer questions about by selecting the most relevant text to the query from the passage. The BERT model is accelerated on Jetson with GPU using TensorRT.

NLP - warming up BERT…
ASR - creating ASR backend: jarvis-jasper
opening GRPC channel localhost:8001
Server status:
id: “inference:0”
version: “1.12.0dev”
uptime_ns: 1331331831425
model_status {
key: “jasper-asr-trt-ensemble-vad-streaming”
value {
config {
name: “jasper-asr-trt-ensemble-vad-streaming”
platform: “ensemble”
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 1
input {
name: “AUDIO_SIGNAL”
data_type: TYPE_FP32
dims: -1
}
input {
name: “SAMPLE_RATE”
data_type: TYPE_UINT32
dims: 1
}
input {
name: “END_FLAG”
data_type: TYPE_UINT32
dims: 1
}
output {
name: “FINAL_TRANSCRIPTS”
data_type: TYPE_STRING
dims: -1
}
output {
name: “FINAL_TRANSCRIPTS_SCORE”
data_type: TYPE_FP32
dims: -1
}
output {
name: “FINAL_WORDS_START_END”
data_type: TYPE_INT32
dims: -1
dims: 2
}
output {
name: “PARTIAL_TRANSCRIPT”
data_type: TYPE_STRING
dims: 1
}
output {
name: “PARTIAL_WORDS_START_END”
data_type: TYPE_INT32
dims: -1
dims: 2
}
output {
name: “AUDIO_PROCESSED”
data_type: TYPE_FP32
dims: 1
}
ensemble_scheduling {
step {
model_name: “feature-extractor-trt-vad-streaming”
model_version: -1
input_map {
key: “AUDIO_SIGNAL”
value: “AUDIO_SIGNAL”
}
input_map {
key: “SAMPLE_RATE”
value: “SAMPLE_RATE”
}
output_map {
key: “AUDIO_FEATURES”
value: “AUDIO_FEATURES”
}
output_map {
key: “AUDIO_PROCESSED”
value: “AUDIO_PROCESSED”
}
}
step {
model_name: “jasper-trt-encoder-streaming”
model_version: -1
input_map {
key: “audio_signal”
value: “AUDIO_FEATURES”
}
output_map {
key: “outputs”
value: “AUDIO_ENCODED”
}
}
step {
model_name: “jasper-trt-decoder-streaming”
model_version: -1
input_map {
key: “encoder_output”
value: “AUDIO_ENCODED”
}
output_map {
key: “output”
value: “CHARACTER_PROBABILITIES”
}
}
step {
model_name: “voice-activity-detector-trt-ctc-streaming”
model_version: -1
input_map {
key: “CLASS_LOGITS”
value: “CHARACTER_PROBABILITIES”
}
output_map {
key: “SEGMENTS_START_END”
value: “SEGMENTS_START_END”
}
}
step {
model_name: “ctc-decoder-cpu-trt-vad-streaming”
model_version: -1
input_map {
key: “CLASS_LOGITS”
value: “CHARACTER_PROBABILITIES”
}
input_map {
key: “END_FLAG”
value: “END_FLAG”
}
input_map {
key: “SEGMENTS_START_END”
value: “SEGMENTS_START_END”
}
output_map {
key: “FINAL_TRANSCRIPTS”
value: “FINAL_TRANSCRIPTS”
}
output_map {
key: “FINAL_TRANSCRIPTS_SCORE”
value: “FINAL_TRANSCRIPTS_SCORE”
}
output_map {
key: “FINAL_WORDS_START_END”
value: “FINAL_WORDS_START_END”
}
output_map {
key: “PARTIAL_TRANSCRIPT”
value: “PARTIAL_TRANSCRIPT”
}
output_map {
key: “PARTIAL_WORDS_START_END”
value: “PARTIAL_WORDS_START_END”
}
}
}
}
version_status {
key: 1
value {
ready_state: MODEL_READY
ready_state_reason {
}
}
}
}
}
ready_state: SERVER_READY

ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.front.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM front
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround51.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround51.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround40.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround40
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround51.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround41
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround51.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround50
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround51.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround51
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.surround71.0:CARD=0’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround71
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM iec958
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition ‘cards.tegra-hda-galen.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2’
ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
ASR - chunk_duration = 0.1
ASR - chunk_size = 1600
ASR - initial capture state: mute
ASR - input is muted by default (hold the Push-to-Talk button to speak)
Expression ‘paInvalidSampleRate’ failed in ‘src/hostapi/alsa/pa_linux_alsa.c’, line: 2048
Expression ‘PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )’ failed in ‘src/hostapi/alsa/pa_linux_alsa.c’, line: 2719
Expression ‘PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )’ failed in ‘src/hostapi/alsa/pa_linux_alsa.c’, line: 2843
[Errno -9997] Invalid sample rate
Traceback (most recent call last):
File “src/chatbot.py”, line 219, in
audio_callback=gui.on_audio if gui else None)
File “/workspace/src/asr/jarvis/asr_engine.py”, line 59, in start
self.audio_generator, “mic”, transcript_callback)
File “/workspace/src/asr/jarvis/speech_utils.py”, line 875, in streaming_recognize
for response in responses:
File “/usr/local/lib/python3.6/dist-packages/grpc/_channel.py”, line 416, in next
return self._next()
File “/usr/local/lib/python3.6/dist-packages/grpc/_channel.py”, line 706, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “Exception iterating requests!”
debug_error_string = “None”

Hi @dkcog123, it may seem that your microphone doesn’t work with 16khz sample rate (this is the rate that the ASR model is trained to recognize). It seems common to support 16khz, in my case I use a Logitech USB headset, so I am surprised to find one not working. Which mic are you using?

When you run the list_microphones.sh script, does it also output the recognized sample rates?

You can also test the ASR independently within the container on a pre-recorded wav file (should be 16khz, mono - there is one already included under test/dusty.wav):

$ python3 src/test_asr.py --mic 24
  • --mic sets the microphone device ID. Run ./scripts/list_microphones.sh to view the active input audio device ID’s on your system.
  • --wav runs the chat from an input audio WAV file instead of microphone (e.g. --wav test/dusty.wav ). The WAV file should have 16KHz sample rate, mono channel.

Thanks for your quick returning!

The Logitech was sold out. Thus, I picked MPOW USB headset from Amazon.

Here is the output of the list_microphones.sh script,

Input Device ID 24 - MPOW HC6: USB Audio (hw:2,0) (inputs=1) (sample_rate=44100)

Thus, the sampling rate issue seems not an issue.

Yes, the test file works. Thus, it is a real mystery why the mic does not work.

By the way, is it possible to use a bluetooth headset instead of an USB headset?

The sample rate used is actually 16000, so perhaps that USB mic doesn’t support the lower sample rate.

I haven’t tested bluetooth headset, I would test it first outside of docker to make sure it can record audio ok.

Finally, I have received my new Logitech H390 USB headset. I have newly flashed my SD Card, and rebuild my MVMe M.2 SSD to make sure.

The result? Working perfectly!

Thanks again for your time!

OK great, glad to hear it! Thanks for letting us know.

I tried use 2 devices:
1 usb camera, 2 usb microphone. Both not works on jetson voice.

And My micro built in camera info at /proc/asound/card4/stream0

USB Camera USB Camera at usb-3610000.xhci-2.4.3, high speed : USB Audio

Capture:
  Status: Stop
  Interface 3
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 4 IN (ASYNC)
    Rates: 8000, 11025, 16000, 22050, 24000, 44100, 48000
    Data packet interval: 1000 u

My USB microphone frequence adapt from 20 to 20k Hz .
My microphone infor at /proc/asound/card2/stream0

C-Media Electronics Inc. USB PnP Sound Device at usb-3610000.xhci-2.1, full spe : USB Audio

Capture:
  Status: Stop
  Interface 1
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 2 IN (ADAPTIVE)
    Rates: 48000, 44100

information inside jetson voice :

Input Device ID 24 - USB PnP Sound Device: Audio (hw:2,0) (inputs=1) (sample_rate=44100)
Input Device ID 26 - USB Camera: Audio (hw:4,0) (inputs=1) (sample_rate=44100)

When I run python3 src/test_asr.py --mic *ID*
The usb microphone get error [Errno -9997] Invalid sample rate
The usb camera microphone auto turn off. But the code maybe works but because micro auto turn off so it doesn’t recieve signal , by this reason the results is so wrong:
"Phrase: Fine. (1.000000)
Phrase: O e. (1.000000)
Partial: Ae (1.000000)
Partial: O a (1.000000)
Phrase: A e e. (1.000000)
Phrase: E o o. (1.000000)
"

Do you have any way to reduce microphone sample rate on ubuntu or inside jetson voice? I need your support.

Thank you.

Hi @nguyenngocdat1995, I don’t have resampling implemented in this older container. I have been updating the project at GitHub - dusty-nv/jetson-voice: ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT but have not yet implemented the microphone resampling.

Can you link me to the mic you have for C-Media Electronics Inc. USB PnP Sound Device
I would like to order a mic that doesn’t support 16KHz sampling rate, so I can implement/test the resampling. Thanks.

Here is the micro i buy on amazon Italy.
https://www.amazon.it/gp/product/B08NK6QT8G/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

I would let you know I use Jetpack 4.4 DP.

Thank you.
Dat