[jetson-voice] ASR/NLP/TTS for Jetson

mirel.t.lazar · August 12, 2022, 9:22am

I can confirm there is a problem with the trained model.
I managed to run NeMo transcribe_speech.py on the Nano and it works fine so I’ll need to change the approach for training my model.

I’m not as lucky getting the model training running on the Nano but I’ll need to try a few more things. I know it is not ideal but if I get to use the GPU it might make is worth it for transfer learning at least. I’ve done this before with the jetson-inference project.

Now I’m looking into updating the num_workers for the dataloader to see if I can make it not time out :-)=

Validation sanity check: 0it [00:00, ?it/s][NeMo W 2022-08-12 10:20:26 nemo_logging:349] /usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 4 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      f"The dataloader, {name}, does not have many workers which may be a bottleneck."

dusty_nv · August 12, 2022, 1:59pm

OK gotcha - I believe the number of workers is set through the OmegaConf config structure when you create the NeMo trainer, but not sure. Regardless I don’t think that should impact the actual convergence of the model, just the training speed perhaps.

Eva01 · April 16, 2023, 5:58am

hello @dusty_nv I’m trying to run jetson-voice asr.py AastaLLL help me run the docker but i receive this error when launching i read througt the forum and found nothing that could help me looked in google too but don’t find answers i tried my mic and run a demo everything work fine

root@jarvis-desktop:/jetson-voice/examples# python3 asr.py
Namespace(debug=False, default_backend=‘tensorrt’, global_config=None, list_devices=False, list_models=False, log_level=‘info’, mic=None, model=‘quartznet’, model_dir=‘/jetson-voice/data/networks’, model_manifest=‘/jetson-voice/data/networks/manifest.json’, profile=False, verbose=False, wav=None)
[NeMo W 2023-04-15 02:52:30 nemo_logging:349] /usr/local/lib/python3.6/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn(“Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work”, RuntimeWarning)

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add ‘export KALDI_ROOT=<your_path>’ in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

################################################################################

[NeMo I 2023-04-15 02:52:30 features:264] PADDING: 0
[NeMo I 2023-04-15 02:52:30 features:281] STFT using torch
[NeMo W 2023-04-15 02:52:30 nemo_logging:349] /usr/local/lib/python3.6/dist-packages/nemo_toolkit-1.6.2-py3.6.egg/nemo/collections/asr/parts/preprocessing/features.py:314: FutureWarning: Pass sr=16000, n_fft=512 as keyword args. From version 0.10 passing these as positional arguments will result in an error
librosa.filters.mel(sample_rate, self.n_fft, n_mels=nfilt, fmin=lowfreq, fmax=highfreq), dtype=torch.float

[2023-04-15 02:52:31] resource.py:114 - loading model ‘/jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.onnx’ with jetson_voice.backends.tensorrt.TRTModel
[2023-04-15 02:52:31] trt_model.py:41 - loading cached TensorRT engine from /jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.engine
[04/15/2023-02:52:35] [TRT] [I] [MemUsageChange] Init CUDA: CPU +225, GPU +0, now: CPU 333, GPU 3573 (MiB)
[04/15/2023-02:52:35] [TRT] [I] Loaded engine size: 42 MiB
[04/15/2023-02:52:37] [TRT] [V] Using cublas as a tactic source
[04/15/2023-02:52:37] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +207, now: CPU 535, GPU 3869 (MiB)
[04/15/2023-02:52:37] [TRT] [V] Using cuDNN as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +240, GPU +20, now: CPU 775, GPU 3889 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Deserialization required 5362705 microseconds.
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +41, now: CPU 0, GPU 41 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Using cublas as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 733, GPU 3847 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Using cuDNN as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 733, GPU 3847 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Total per-runner device persistent memory is 36948992
[04/15/2023-02:52:41] [TRT] [V] Total per-runner host persistent memory is 282384
[04/15/2023-02:52:41] [TRT] [V] Allocated activation device memory of size 1076224
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +36, now: CPU 0, GPU 77 (MiB)
[2023-04-15 02:52:41] trt_model.py:59 - loaded TensorRT engine from /jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.engine

binding 0 - ‘audio_signal’
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{‘min’: (1, 64, 10), ‘opt’: (1, 64, 150), ‘max’: (1, 64, 300)}]

binding 1 - ‘logprobs’
input: False
shape: (1, -1, 29)
dtype: DataType.FLOAT
size: -116
dynamic: True
profiles:

[2023-04-15 02:52:42] ctc_beamsearch.py:51 - creating CTCBeamSearchDecoder
[2023-04-15 02:52:42] ctc_beamsearch.py:52 - {‘add_punctuation’: True,
‘alpha’: 0.7,
‘beam_width’: 32,
‘beta’: 0.0,
‘cutoff_prob’: 1.0,
‘cutoff_top_n’: 40,
‘language_model’: ‘/jetson-voice/data/networks/asr/quartznet-15x5_en/lm.bin’,
‘timestep_offset’: 5,
‘top_k’: 3,
‘type’: ‘beamsearch’,
‘vad_eos_duration’: 0.65,
‘word_threshold’: -1000.0}
[2023-04-15 02:52:49] asr_engine.py:128 - CTC decoder type: ‘beamsearch’
Traceback (most recent call last):
File “asr.py”, line 30, in
chunk_size=asr.chunk_size)
File “/jetson-voice/jetson_voice/utils/audio.py”, line 67, in AudioInput
raise ValueError(‘either wav or mic argument must be specified’)
ValueError: either wav or mic argument must be specified

Eva01 · April 16, 2023, 5:59am

KALDI_ROOT=<root/jarvis/desktop/jetson-voice/examples> python .py
no directory

dusty_nv · April 17, 2023, 1:41pm

@Eva01 you can ignore those warnings about Kaldi. To run asr.py, you need to either specify a --wav file or --mic device ID to use, like shown here: https://github.com/dusty-nv/jetson-voice#automatic-speech-recognition-asr

Eva01 · April 18, 2023, 7:08am

thank you very much it work @dusty_nv @linuxdev told me you know about everything of Ai so i would like to take the chance to ask you if you know how can i train it as a offline voice commande i saw it use MatchboxNet and Google Speech Commands but how can i change the command and train it cause i want to lauch it from code-oss and i want the code to have access to my gpio for servo-motor and other things.

dusty_nv · April 18, 2023, 1:07pm

Hi @Eva01, it is already doing speech recognition / speech commands offline (i.e. all the speech processing is done locally onboard Nano using DNNs). To integrate your GPIO and other peripherals, you would modify the asr.py example or add it to your own script. You can start the container in --dev mode to make editing easier.

I haven’t had to retrain/finetune the ASR models, but if you want to add your own speech commands to Matchboxnet, you can do it in NeMo: https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Speech_Commands.ipynb#scrollTo=I62_LJzc-p2b

Eva01 · April 21, 2023, 1:35pm

hello @dusty_nv sorry for the late reply and the spam i read the documentation and i think i begin to understand the process but where do i have to download the voice for example a want to open servo i need save me saying servo using some python file and download the dataset in one of the program i think its nemo_export onnx.py after that i need to use nemo_export onnx.py and train it on nemo_train_intent.py and after the model is make i need to introduce it to asr.py ???

Eva01 · April 21, 2023, 2:10pm

i read on the documentation to download the dataset a script is provided under the NeMo root directory scripts sub-directory but i don’t find such directory on scripts

dusty_nv · April 21, 2023, 3:07pm

The nemo_train_intent.py is to train a Transformer model (like BERT or Distilbert) as an intent/slot classifier for NLP - which would typically be a good use-case for what you are doing. However if you’re on Jetson Nano, you may not have enough memory to run ASR and NLP at the same time, you would have to try.

Since your commands are limited in scope, I would just use the stock ASR model, and do basic string parsing / regex on that to find the commands. Then if you need to re-train models later you can. I haven’t trained my own ASR or speech command models. I think it’s easier to just use the included ASR model (the full ASR, not speech command) and then do your own NLP or train an intent/slot classifier for it.

Eva01 · April 22, 2023, 5:41am

thank you i will do more search on this and open new topic when i will understand programmeur slang

Eva01 · April 23, 2023, 12:25am

i have one last question and its not about jetson-voice i try the asr.py and i need to repeat myself many time when trying to say jarvis so i need to train it i would like to try another method by the time i get better and train my own asr model. the method is picovoice porcupine but no mater what i tried my acceskey don’t work i already trained the wake up world and have it as a file but i cannot open cause the file is a unknow type

dusty_nv · April 24, 2023, 6:07pm

Hi @Eva01, sorry about that, if you are using Xavier/Orin you could run the actual RIVA ASR backend (which has better accuracy). I’m not familiar with picovoice and haven’t used it, so I would recommend contacting their support if you have trouble running/installing it or using their API keys.

Eva01 · April 25, 2023, 5:07am

thank you for the help :))

sangeethagr2018 · September 17, 2023, 4:55am

Is it possible to use custom voice model with jetson-voice tts engine ?

dusty_nv · September 17, 2023, 8:42pm

@sangeethagr2018 at this point I’d just recommend using Riva directly, and you can fine-tune your own speech models in NeMo and export them to Riva:

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-finetune-nemo.html

jenhungho · October 22, 2023, 7:09am

In /jetson_voice_ros/asr.py source code.
If user specify model:=matchboxnet
It raises error message:
“jetson_voice_ros/asr node does not support ASR classification models”
Is this by design?

ros2 launch ./ros/launch/asr.launch.py model:=matchboxnet input_device:=11

In source code : jetson-voice/ros/jetson_voice_ros/asr.py

load the ASR model

    self.asr = ASR(self.model_name)
    self.get_logger().info(f"model '{self.model_name}' ready")
    
    if self.asr.classification:
        raise ValueError(f'jetson_voice_ros/asr node does not support ASR classification models')

dusty_nv · October 22, 2023, 6:22pm

@jenhungho yes that ASR ROS node only supports transcription (audio-to-text) not classification (audio-to-class). You could make a ROS node that did that though.

deepanshu.pandey · November 21, 2023, 8:36am

@dusty_nv Does Jetson-voice library support C language also? I have found all scripts in python only. Please guide me if it is also available officially in C.

Thanks

dusty_nv · November 21, 2023, 2:17pm

@deepanshu.pandey it’s Python, and for JetPack 4. For JetPack 5, there are a number of new tutorials/libraries/containers available here:

Topic		Replies	Views
NVIDIA Jetson Nano 2GB Developer Kit available now Jetson Nano	79	6287	March 10, 2022
JetPack 6.0 Production Release - Announcement Jetson AGX Orin	16	2652	June 3, 2024
JetPack 4.3 - L4T R32.3.1 released Jetson Nano opencv	98	21754	June 24, 2020
JetPack 4.2.1 - L4T R32.2 release for Jetson Nano, Jetson TX1/TX2, and Jetson AGX Xavier Jetson Nano	64	9728	August 10, 2019
Links to Jetson Nano Resources & Wiki Jetson Nano kb	75	43323	April 27, 2022
Cant install Pytorch on JetsonNano P3450 Jetson Nano pytorch	21	2369	August 16, 2023
Jetson Nano Brings AI Computing to Everyone Technical Blog	71	1122	March 13, 2020
Jetson AI Lab - Home Assistant Integration Jetson Projects generative_ai	51	6348	November 1, 2024
Voice Demo Container for Jetson Xavier NX not working Jetson Xavier NX audio	11	1826	October 18, 2021
JetPack 5.1 with Jetson Linux 35.2.1 Released Jetson Xavier NX	16	4008	April 11, 2023

[jetson-voice] ASR/NLP/TTS for Jetson

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add ‘export KALDI_ROOT=<your_path>’ in your $HOME/.profile)

(or run as: KALDI_ROOT=<your_path> python <your_script>.py)

load the ASR model

Related Topics