I am running this on a Nano orin NX with 16M mem. and 2T NVME. it is on Jetpack 6.2 with LT4 R36.4.3. It runs and the model can speak and my audio does register but does not do anything. Please help.
There are several errors that occur during execution. The application does run and does text to speech but not speech to text. I’m just learning how to use these LLM’s. Thank you for your help
DISPLAY environmental variable is already set: “:0”
localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist
docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orin/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/video2 --device /dev/video3 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20250318_090650 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /data/models/huggingface/token
Login successful
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 11591.40it/s]
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 3318.90it/s]
09:07:02 | INFO | loading /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a with MLC
09:07:06 | INFO | NumExpr defaulting to 8 threads.
09:07:06 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
09:07:08 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12060, driver_version=None
09:07:08 | INFO | loading Meta-Llama-3-8B-Instruct from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so
09:07:08 | WARNING | model library /data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so was missing metadata
┌─────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ architectures │ [‘LlamaForCausalLM’] │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_bias │ False │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_dropout │ 0.0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ bos_token_id │ 128000 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ eos_token_id │ 128009 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_act │ silu │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_size │ 4096 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ initializer_range │ 0.02 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ intermediate_size │ 14336 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_position_embeddings │ 8192 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ model_type │ llama │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_attention_heads │ 32 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_hidden_layers │ 32 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_key_value_heads │ 8 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ pretraining_tp │ 1 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rms_norm_eps │ 1e-05 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_scaling │ │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_theta │ 500000.0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings │ False │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ torch_dtype │ bfloat16 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ transformers_version │ 4.40.0.dev0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ use_cache │ True │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ vocab_size │ 128256 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ name │ Meta-Llama-3-8B-Instruct │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ api │ mlc │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_path │ /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snaps │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ quant │ q4f16_ft │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ type │ llama │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_length │ 8192 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size │ -1 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ load_time │ 10.129087102 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ params_size │ 3895.7578125 │
└─────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘
09:07:12 | INFO | using chat template ‘llama-3’ for model Meta-Llama-3-8B-Instruct
09:07:12 | INFO | model ‘Meta-Llama-3-8B-Instruct’, chat template ‘llama-3’ stop tokens: [‘<|end_of_text|>’, ‘<|eot_id|>’] → [128001, 128009]
09:07:12 | INFO | Warming up LLM with query ‘What is 2+2?’
09:07:13 | INFO | Warmup response: ‘Easy peasy!\n\nThe answer to 2+2 is… 4!<|eot_id|>’
09:07:13 | INFO | plugin | connected ChatQuery to PrintStream on channel 0
09:07:14 | INFO | plugin | connected VADFilter to RivaASR on channel 0
09:07:14 | INFO | plugin | connected RivaASR to PrintStream on channel 0
09:07:14 | INFO | plugin | connected RivaASR to PrintStream on channel 1
09:07:14 | INFO | plugin | connected RivaASR to asr_partial on channel 1
09:07:14 | INFO | plugin | connected RivaASR to asr_final on channel 0
09:07:14 | INFO | plugin | connected RivaASR to ChatQuery on channel 0
09:07:14 | INFO | loading Piper TTS model from /data/models/piper/en_US-libritts-high.onnx
2025-03-18 09:07:15.652016193 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch-jit-export for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-18 09:07:15.680400714 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-18 09:07:15.680460982 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
09:07:16 | WARNING | Piper TTS failed to set speaker to ‘None’, ignoring… (None)
09:07:17 | INFO | plugin | connected PiperTTS to RateLimit on channel 0
09:07:17 | INFO | plugin | connected ChatQuery to PiperTTS on channel 1
09:07:17 | INFO | plugin | connected UserPrompt to ChatQuery on channel 0
09:07:17 | INFO | plugin | connected RivaASR to on_asr_partial on channel 1
09:07:17 | INFO | plugin | connected ChatQuery to on_llm_reply on channel 0
09:07:17 | INFO | plugin | connected RateLimit to on_tts_samples on channel 0
09:07:17 | INFO | mounting webserver path /tmp/uploads to /uploads
09:07:17 | INFO | starting webserver @ https://0.0.0.0:8050
09:07:17 | SUCCESS | WebChat - system ready
Serving Flask app ‘nano_llm.web.server’
Debug mode: on
Exception in thread RivaASR:
Traceback (most recent call last):
File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner
self.run()
File “/opt/NanoLLM/nano_llm/plugins/speech/riva_asr.py”, line 117, in run
self.generate(self.audio_queue)
File “/opt/NanoLLM/nano_llm/plugins/speech/riva_asr.py”, line 134, in generate
for response in responses:
File “/usr/local/lib/python3.10/dist-packages/riva/client/asr.py”, line 387, in streaming_response_generator
for response in self.stub.StreamingRecognize(generator, metadata=self.auth.get_auth_metadata()):
File “/usr/local/lib/python3.10/dist-packages/grpc/_channel.py”, line 543, in next
return self._next()
File “/usr/local/lib/python3.10/dist-packages/grpc/_channel.py”, line 952, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = “failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Failed to connect to remote host: connect: Connection refused (111)”
debug_error_string = “UNKNOWN:Error received from peer {grpc_message:“failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Failed to connect to remote host: connect: Connection refused (111)”, grpc_status:14, created_time:“2025-03-18T09:07:17.493306563-07:00”}”
09:07:17 | INFO | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.