DeepSpeech for Jetson/Xavier

I have built DeepSpeech (a.k.a. Mozilla Voice STT) for Jetson and Xavier (JetPack 4.4): Release DeepSpeech v0.9.3 · domcross/DeepSpeech-for-Jetson-Nano · GitHub

Any feedback welcome…


Here is DeepSpeech v0.9.0 for Jetson/Xavier

1 Like

And here is DeepSpeech v0.9.2 for Jetson/Xavier


Here is the (belated) release of DeepSpeech v.0.9.3 for Jetson/Xavier (Python wheel only).

1 Like

Dear dkreutz,
Thank you so much for your contribution. I have installed your DeepSpeech 0.9.3 on the Jetson Nano.
I am not sure to know if I am using it correctly.

When I am trying it with the models and sample audio from the documentation, it looks like that deepspeech only use my CPU.
Here is the output from the Terminal windows output_deepspeech.txt (3.7 KB)

Do we also need to copy the file in our library files ? I have tried it but it does not change the result presented in attachments.

Maybe I am worrying for nothing, but your expert’s point of view would be welcomed.
Thank you in advance,
Best regards!

1 Like

Hi Clement,
Your logfile looks good, libcuda is loaded and GPU is initialized:

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 607 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)

What might be an issue is that GPU could claim only 607MB memory. This might be not enough to load the model for GPU access. I guess you are running your Nano with GUI-desktop enabled?
I am not sure if that can force DeepSpeech inference to CPU usage, as most of the time i am running DeepSpeech on my Xavier AGX which has 32GB RAM so I don’t see that problem there. My Nano is occupied with other tasks now, so I can’t test this right now

When running inference from command line like you did most of the time is spent on loading and initializing the model - which is mostly performed on CPU. You should see a short peak of GPU usage at the end of inference, though.

Best regards,

1 Like

Dear Dominik,

Thank you for your time.
Indeed, I am running the nano with GUI-desktop opened. I have also tried to launch the deepsearch command with a Chromium browser opened and only 100MB of memory was allocated. In consequence, Inference wasn’t able to work.

With the same conditions than before, I have checked the GPU usage with jetson_stats, indeed I have some peaks during the command execution, as you expected.

Finally, I have tried to launch the command in head-less mode, the memory allocated was indeed significatly higher (1778MB) but the time taken by Inference was longer (between 15s and 3.5s in different tries). I would have expected a correlation between the memory allocated and Inference time. In fact this is not even repeatable, each run provides different Inference time (and it is not only decreasing between the tries).

Nevertheless, it is working so let’s start playing & learning with the DeepSpeech models !
Best regards,


When running inference with command line it might take longer because there is now more memory available for initialization.

I recommend to use DeepSpeech-Server or deepspeech-websocket-server. Model will be loaded once and then you can perform inference on multiple audio files.


Thank you again, for sure I will check the link you have shared.
Best regards,

Will these work on other Jetson devices like Xavier-NX? I’d really like to have this running on my JP4.4 Xavier-NX.

Yes. The DeepSpeech wheels are always build with CUDA compute capability 5.3, 6.2 and 7.2 which makes them compatible with Jetson TX1, Jetson Nano, Xavier NX and Xavier AGX.


Awesome. You’ve saved me a world of headache. :)

1 Like

Do you have a manual for compiling it?
I am currently struggling compiling DeepSpeech with cuda 10.2 :(

Compiling libdeepspeech and the python-bindings is straightforward. See my „release notes“ for v0.8.2

Awesome, but do you have time to compile a version with debug symbols?
If not, where can I find a tutorial to build one myself?
I am trying to profile my deepspeech application, and it would be super helpful if I could see function names.

Sorry, i am afraid I can’t help you with that. You may want to have a look at the build documentation. In addition you will need knowledge about bazel-build and debugging C/C++ applications in general.
The Jetson/Xavier specific build options are documented in my release notes for v0.8.2

Thank you. I managed to get the debug symbols successfully.

For future reference, I rebuilt the file using different bazel build options, and swapped out the original in my python environment coming from your wheels. I didn’t build the whole wheel.

My build options are as follows. I disabled O3 optimization since that would get some functions inlined undesireably:
bazel build --workspace_status_command="bash native_client/" --config=monolithic --copt=-march=armv8-a --copt=-mtune=cortex-a57 --copt="-g" --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden --config=cuda --config=nonccl --config=noaws --config=nogcp --verbose_failures --config=nohdfs --config=v2 --copt=-fPIC -c dbg --strip=never // //native_client:generate_scorer_package

If the build worked correctly, one should see the new in the bazel-bin/native_client directory. One can confirm the debug symbols are kept by:

  1. Running file will show that it is “not stripped”, which means the debug symbols are kept.
  2. Running gdb and gdb will become unresponsive when it tries to read all those debug symbols.
1 Like

Greetings, thanks very much! I’m comparing DeepSpeech performance on Raspberry Pi and Jetson/Xavier. Do you have something like the “Live Transcription from a Microphone” at the bottom of GitHub - touchgadget/DeepSpeech: Install Mozilla DeepSpeech on a Raspberry Pi 4 ? Thanks, Kevin

1 Like

Didn’t try it myself on the Jetson Nano but DeepSpeech-examples have some streaming examples as well:

1 Like

I have not experimented with DeepSpeech, but am curious about something: Would DeepSpeech be able to recognize and convert speech into phonetics? There are a limited number of phonetics among all “common” languages, and I am reminded of an old project I never completed (which used phonetics to deal with lossy networks for voice). The goal was to take partial phonetics and guess at the full phonetic, or to take completely missing phonetics and try to create something of a word recognition system based on phonetics instead of the usual dictionary. I just find it interesting to break speech down to the phonetic level since there are so many interesting things you can do with phonetics.

1 Like