hello @dusty_nv I’m trying to run jetson-voice asr.py AastaLLL help me run the docker but i receive this error when launching i read througt the forum and found nothing that could help me looked in google too but don’t find answers i tried my mic and run a demo everything work fine
root@jarvis-desktop:/jetson-voice/examples# python3 asr.py
Namespace(debug=False, default_backend=‘tensorrt’, global_config=None, list_devices=False, list_models=False, log_level=‘info’, mic=None, model=‘quartznet’, model_dir=‘/jetson-voice/data/networks’, model_manifest=‘/jetson-voice/data/networks/manifest.json’, profile=False, verbose=False, wav=None)
[NeMo W 2023-04-15 02:52:30 nemo_logging:349] /usr/local/lib/python3.6/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn(“Couldn’t find ffmpeg or avconv - defaulting to ffmpeg, but may not work”, RuntimeWarning)
################################################################################
WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk
(please add ‘export KALDI_ROOT=<your_path>’ in your $HOME/.profile)
(or run as: KALDI_ROOT=<your_path> python <your_script>.py)
################################################################################
[NeMo I 2023-04-15 02:52:30 features:264] PADDING: 0
[NeMo I 2023-04-15 02:52:30 features:281] STFT using torch
[NeMo W 2023-04-15 02:52:30 nemo_logging:349] /usr/local/lib/python3.6/dist-packages/nemo_toolkit-1.6.2-py3.6.egg/nemo/collections/asr/parts/preprocessing/features.py:314: FutureWarning: Pass sr=16000, n_fft=512 as keyword args. From version 0.10 passing these as positional arguments will result in an error
librosa.filters.mel(sample_rate, self.n_fft, n_mels=nfilt, fmin=lowfreq, fmax=highfreq), dtype=torch.float
[2023-04-15 02:52:31] resource.py:114 - loading model ‘/jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.onnx’ with jetson_voice.backends.tensorrt.TRTModel
[2023-04-15 02:52:31] trt_model.py:41 - loading cached TensorRT engine from /jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.engine
[04/15/2023-02:52:35] [TRT] [I] [MemUsageChange] Init CUDA: CPU +225, GPU +0, now: CPU 333, GPU 3573 (MiB)
[04/15/2023-02:52:35] [TRT] [I] Loaded engine size: 42 MiB
[04/15/2023-02:52:37] [TRT] [V] Using cublas as a tactic source
[04/15/2023-02:52:37] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +207, now: CPU 535, GPU 3869 (MiB)
[04/15/2023-02:52:37] [TRT] [V] Using cuDNN as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +240, GPU +20, now: CPU 775, GPU 3889 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Deserialization required 5362705 microseconds.
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +41, now: CPU 0, GPU 41 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Using cublas as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 733, GPU 3847 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Using cuDNN as a tactic source
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 733, GPU 3847 (MiB)
[04/15/2023-02:52:41] [TRT] [V] Total per-runner device persistent memory is 36948992
[04/15/2023-02:52:41] [TRT] [V] Total per-runner host persistent memory is 282384
[04/15/2023-02:52:41] [TRT] [V] Allocated activation device memory of size 1076224
[04/15/2023-02:52:41] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +36, now: CPU 0, GPU 77 (MiB)
[2023-04-15 02:52:41] trt_model.py:59 - loaded TensorRT engine from /jetson-voice/data/networks/asr/quartznet-15x5_en/quartznet.engine
binding 0 - ‘audio_signal’
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{‘min’: (1, 64, 10), ‘opt’: (1, 64, 150), ‘max’: (1, 64, 300)}]
binding 1 - ‘logprobs’
input: False
shape: (1, -1, 29)
dtype: DataType.FLOAT
size: -116
dynamic: True
profiles:
[2023-04-15 02:52:42] ctc_beamsearch.py:51 - creating CTCBeamSearchDecoder
[2023-04-15 02:52:42] ctc_beamsearch.py:52 - {‘add_punctuation’: True,
‘alpha’: 0.7,
‘beam_width’: 32,
‘beta’: 0.0,
‘cutoff_prob’: 1.0,
‘cutoff_top_n’: 40,
‘language_model’: ‘/jetson-voice/data/networks/asr/quartznet-15x5_en/lm.bin’,
‘timestep_offset’: 5,
‘top_k’: 3,
‘type’: ‘beamsearch’,
‘vad_eos_duration’: 0.65,
‘word_threshold’: -1000.0}
[2023-04-15 02:52:49] asr_engine.py:128 - CTC decoder type: ‘beamsearch’
Traceback (most recent call last):
File “asr.py”, line 30, in
chunk_size=asr.chunk_size)
File “/jetson-voice/jetson_voice/utils/audio.py”, line 67, in AudioInput
raise ValueError(‘either wav or mic argument must be specified’)
ValueError: either wav or mic argument must be specified