[jetson-voice] ASR/NLP/TTS for Jetson

Reflash fixed the issue. Thank you. Demo containers running fine now.

1 Like

I have tried pulling the container but get this error

Unable to find image ā€˜dustynv/jetson-voice:r34.1.1ā€™ locally
docker: Error response from daemon: Get ā€œhttps://registry-1.docker.io/v2/ā€: context deadline exceeded.

Sorry @Out_of_the_BOTS, Iā€™ve not yet updated and released this container for JetPack 5.0. Itā€™s on my todo list but I donā€™t yet have a concrete timeline for when it will be out.

I think I prefer it without the container anyway as it will be run on a robot that needs other models as well.

I am currently running your trt_pose API with tensorrt, torch v11 and torchvision v12 already installed what other dependencies does this jetson-voice API need?

There are many complex dependencies of jetson-voice, including NeMo, so it is only built in container. You could try building the container but my guess is there need to be updates in order to support JetPack 5.0.

Thank you fro taking the time to answer my qustions.

I come from robotics background and am wanting to learn AI to be able to add it to my robots, this is why I am trying to learn the Jetson platform.

As I am learning I may ask dumb questions, I apologize.

Will the following tutorials work locally on Jetpack 5 Tutorials ā€” NVIDIA NeMo they use NeMo 1.10.0.

Iā€™m not sure that NeMo supports Jetson out of the box - typically it required some work for me to build it in the jetson-voice container. I build it in my jetson-voice Dockerfile, but have not yet tried it with updated NeMo 1.10 version or for JetPack 5.0.

@Out_of_the_BOTS same here, got it running on the Nano with a bit of a headache.
best is to try to replicate the steps in the Docker file for aarch64.
let me know if you still struggle with that and Iā€™ll try to help (I have saved all wheels but Iā€™m not on Jetpack 5).

@dusty_nv Iā€™m having issues running my own ASR trained network as a subset of LibriSpeech. I have the .nemo file but I cannot run the nemo_export_onnx.py due to many missing packages. ultimately it fails with nemo.collections.nlp missing and pip3 install nemo_toolkit[nlp] wonā€™t solve it.

used the Nemo container on x86_64 and got the .onnx file but it is missing .json and lm.bin
could you please guide me a bit? thanks in advance

sudo docker/run.sh
ARCH: aarch64
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.7.2
CONTAINER: dustynv/jetson-voice:r32.7.1
DEV_VOLUME:
DATA_VOLUME: --volume /home/jetson/Work/nano-tools/jetson-voice/data:/jetson-voice/data
USER_VOLUME:
USER_COMMAND:

root@nanobox:/jetson-voice# scripts/nemo_export_onnx.py --type=asr --model=data/networks/asr/quartznet-15x5_en_few/QuartzNet15x5.nemo --output=Quartznet_few.onnx
[clip]
ModuleNotFoundError: No module named 'webdataset'

took out the NLP part from the script as Iā€™m not using it, installed the following (all in docker on Jetson Nano 4GB):

pip3 install webdataset
pip3 install omegaconf
pip3 install pytorch_lightning
pip3 install rapidfuzz
pip3 install hydra hydra-core

apt update && apt install ffmpeg

managed to run the script only to get ā€˜Killedā€™ so I need to figure out whatā€™s the right memory configuration to get it running.
but after this exercise, it seems that the environment got messed since I canā€™t use it anymore.
it seems thereā€™s an issue with the ā€˜transformersā€™. the examples/asr.py script fails with:

Traceback (most recent call last):
  File "examples/asr.py", line 26, in <module>
    asr = ASR(args.model)
  File "./jetson_voice/asr.py", line 18, in ASR
    return load_resource(resource, factory_map, *args, **kwargs)
  File "./jetson_voice/utils/resource.py", line 85, in load_resource
    return class_type(config, *args, **kwargs)
  File "./jetson_voice/models/asr/asr_engine.py", line 102, in __init__
    preprocessor_class = getattr(importlib.import_module(preprocessor_name[0]), preprocessor_name[1])
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/__init__.py", line 15, in <module>
    from nemo.collections.asr import data, losses, models, modules
  File "/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/losses/__init__.py", line 15, in <module>
    from nemo.collections.asr.losses.angularloss import AngularSoftmaxLoss
  File "/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/losses/angularloss.py", line 18, in <module>
    from nemo.core.classes import Loss, Typing, typecheck
  File "/usr/local/lib/python3.6/dist-packages/nemo/core/__init__.py", line 16, in <module>
    from nemo.core.classes import *
  File "/usr/local/lib/python3.6/dist-packages/nemo/core/classes/__init__.py", line 16, in <module>
    from nemo.core.classes.common import (
  File "/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py", line 32, in <module>
    from nemo.utils import logging, model_utils
  File "/usr/local/lib/python3.6/dist-packages/nemo/utils/__init__.py", line 22, in <module>
    from nemo.utils.lightning_logger_patch import add_memory_handlers_to_pl_logger
  File "/usr/local/lib/python3.6/dist-packages/nemo/utils/lightning_logger_patch.py", line 18, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
    from pytorch_lightning.callbacks.base import Callback
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
    from pytorch_lightning.utilities.types import STEP_OUTPUT
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/types.py", line 25, in <module>
    from torchmetrics import Metric
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/functional/audio/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate  # noqa: F401
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/functional/audio/pit.py", line 21, in <module>
    from torchmetrics.utilities.imports import _SCIPY_AVAILABLE
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/utilities/imports.py", line 114, in <module>
    _TRANSFORMERS_AVAILABLE: bool = _package_available("transformers")
  File "/usr/local/lib/python3.6/dist-packages/torchmetrics/utilities/imports.py", line 34, in _package_available
    return find_spec(package_name) is not None
  File "/usr/lib/python3.6/importlib/util.py", line 102, in find_spec
    raise ValueError('{}.__spec__ is None'.format(name))
ValueError: transformers.__spec__ is None

Hi @mirel.t.lazar, are you using the jetson-voice container to run this?

yes.
after installing the required packages and removing the nlp part it seems to run but I havenā€™t figured out the memory part yet as it gets killed every time.

The jetson-voice container should already have the required packages installed, so unsure why you are having to install them, but itā€™s possible it has something to do with the error.

If you just start a fresh jetson-voice container, are you able to run the ASR example with the default built-in model?

I reflashed a previous image where I had both the jetson-voice packages installed natively and the docker image and on both sides I can run the asr.py example.
I am however back to the issue of running the scripts/nemo_export_onnx.py
I will remove the docker image and start fresh on this.

My most pressing issue is running my own trained network.
Could you please guide me regarding the language model file lm.bin? How can I obtain that? Is it transferrable ā€˜as isā€™ to a newly Nemo trained network based on the quartznet configuration but using different dataset?
Thanks again

@dusty_nv I tried the scripts/nemo_export_onnx.py on a fresh docker image and it failed due to missing webdataset:

root@nanobox:/jetson-voice# scripts/nemo_export_onnx.py --type=asr --model=data/networks/asr/quartznet-15x5_en_few/QuartzNet15x5.nemo --output=Quartznet_few.onnx
[clip]
ModuleNotFoundError: No module named 'webdataset'

After installing the required packages, examples/asr.py no longer works. So we have a full circle. This happens most likely because some packages get updated and are no longer compatible with the jetson-voice environment.
Being a docker image, we can just start over clean but the issue remains: nemo exporter doesnā€™t seem to work out of the box in docker.
Other than this, I also have the issue of missing lm.bin.

If I use a NeMo docker image to convert to .onnx, I donā€™t get the .json file nor the lm.bin file.

Cheers

Hmm okayā€¦can you try running the jetson-voice docker on x86? Assuming you have the NVIDIA Container Runtime installed on a Linux PC, just clone the jetson-voice repo and run docker/run.sh I recall exporting the ONNX models from PC (using the nemo_export_onnx.py)

lm.bin should be under the jetson-voice/data/networks/asr/quartznet-15x5_en directory. If you are unable to get the json file, you may be able to re-use the one from the pretrained quartznet-15x5_en model (perhaps with some minor tweaks if needed)

it seems Iā€™m stuck with this since my only GPU-enabled option is using Google Cloud.

jetson-voice$ docker/run.sh 
ARCH:  unknown
unsupported architecture:  unknown
~/voice/jetson-voice$ uname -a
Linux instance-1 4.19.0-21-cloud-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64 GNU/Linux

After having hacked the os_version.sh script, I get to run the image butā€¦

root@instance-1:/jetson-voice# examples/asr.py 
Namespace(debug=False, default_backend='tensorrt', global_config=None, list_devices=False, list_models=False, log_level='info', mic=None, model='quartznet', model_dir='/jetson-voice/data/networks', model_manifest='/jetson-voice/data/networks/manifest.json', profile=False, verbose=False, wav=None)
[2022-08-08 21:20:56] resource.py:184 - downloading 'quartznet-15x5_en' from https://nvidia.box.com/shared/static/l8gemvzp85os6xhge16igy1mzvtyvgbd.gz (attempt 1 of 10)
quartznet-15x5_en: 0.00B [00:00, ?B/s]
[2022-08-08 21:20:56] resource.py:191 - module 'urllib' has no attribute 'request'
[2022-08-08 21:20:56] resource.py:202 - failed to download 'quartznet-15x5_en' from https://nvidia.box.com/shared/static/l8gemvzp85os6xhge16igy1mzvtyvgbd.gz (attempt 1 of 10)
[2022-08-08 21:20:56] resource.py:207 - waiting 5 seconds before trying again...

I was however able to export the nemo model to onnx and get the json file too.

thanks

OK, great - it seems like the URLs to download the models was blocked with your cloud provider, but those arenā€™t necessary for running the ONNX export. Will take a look at the errors you had before when I build the next version of these containers for aarch64.

thank you. let me know if you need more info.

regarding the netwok training, it seems to be accepted by the jetson-voice asr engine but it doesnā€™t seem to behave as expected:

[2022-08-09 09:35:54] trt_model.py:59 - loaded TensorRT engine from ./data/networks/asr/quartznet-15x5_en_few/QuartzNet15x5.engine

binding 0 - 'audio_signal'
   input:    True
   shape:    (1, 64, -1)
   dtype:    DataType.FLOAT
   size:     -256
   dynamic:  True
   profiles: [{'min': (1, 64, 10), 'opt': (1, 64, 150), 'max': (1, 64, 300)}]


binding 1 - 'logprobs'
   input:    False
   shape:    (1, -1, 29)
   dtype:    DataType.FLOAT
   size:     -116
   dynamic:  True
   profiles: []

[2022-08-09 09:35:55] asr_engine.py:128 - CTC decoder type: 'greedy'
[clip]
[2022-08-09 09:35:55] audio.py:156 - trying to open audio input 19 with sample_rate=16000 chunk_size=16000
                                                                            
audio stream opened on device 19 (default)                            
you can begin speaking now... (press Ctrl+C to exit)                                                         
                                                                                                             
.                                                                                 
                                                                  
.                                                                                                            
                                                                                                             
.                                                                                 
                                                                  
.                                                                                                            
                                                                                                             
.

Iā€™m not sure if it is a network or configuration issue. I trained the network using NeMo 1.6.2 and the default QuartzNet configuration with no other changes and the dataset used is a subset of LibriSpeech. I didnā€™t use transfer learning:
python examples/asr/asr_ctc/speech_to_text_ctc.py --config-path=/NeMo/conf/quartznet --config-name=quartznet_15x5 model.train_ds.manifest_filepath=./few-commands.json model.validation_ds.manifest_filepath=./few-commands-test.json trainer.max_epochs=1 trainer.accelerator='cpu'

Thanks again for your help

Did you confirm your model to be working with acceptable accuracy in NeMo before exporting it to ONNX?

Iā€™ve only used the pre-trained ASR models available from the NeMo repositories, so unfortunately Iā€™m not much help with training those.

Iā€™m trying but it gets complicated due to my hw configurations: where I have the hardware I donā€™t have software support and the other way around :D