WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize

psavarese2123 · March 17, 2025, 6:00pm

I receive this error when starting llamaspeak with the following command:

jetson-containers run $(autotag nano_llm)
python3 -m nano_llm.agents.web_chat --api=mlc
–model Efficient-Large-Model/VILA-7b
–asr=riva --tts=piper

I am running this on a Nano orin NX with 16M mem. and 2T NVME. it is on Jetpack 6.2 with LT4 R36.4.3. It runs and the model can speak and my audio does register but does not do anything. Please help.

AastaLLL · March 18, 2025, 3:50am

Hi,

Is there any log shown on the console?
Could you share the detailed commands you test with us?

Thanks.

psavarese2123 · March 18, 2025, 4:17pm

There are several errors that occur during execution. The application does run and does text to speech but not speech to text. I’m just learning how to use these LLM’s. Thank you for your help

orin@ubuntu:~$ jetson-containers run --env HUGGINGFACE_TOKEN=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
$(autotag nano_llm)
python3 -m nano_llm.agents.web_chat --api=mlc
–model meta-llama/Meta-Llama-3-8B-Instruct
–asr=riva --tts=piper
Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r36.4.0
V4L2_DEVICES: --device /dev/video0 --device /dev/video1 --device /dev/video2 --device /dev/video3

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orin/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/video2 --device /dev/video3 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20250318_090650 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /data/models/huggingface/token
Login successful
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 11591.40it/s]
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 3318.90it/s]
09:07:02 | INFO | loading /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a with MLC
09:07:06 | INFO | NumExpr defaulting to 8 threads.
09:07:06 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
09:07:08 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12060, driver_version=None
09:07:08 | INFO | loading Meta-Llama-3-8B-Instruct from /data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so
09:07:08 | WARNING | model library /data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/Meta-Llama-3-8B-Instruct-q4f16_ft-cuda.so was missing metadata
┌─────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ architectures │ [‘LlamaForCausalLM’] │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_bias │ False │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_dropout │ 0.0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ bos_token_id │ 128000 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ eos_token_id │ 128009 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_act │ silu │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_size │ 4096 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ initializer_range │ 0.02 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ intermediate_size │ 14336 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_position_embeddings │ 8192 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ model_type │ llama │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_attention_heads │ 32 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_hidden_layers │ 32 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_key_value_heads │ 8 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ pretraining_tp │ 1 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rms_norm_eps │ 1e-05 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_scaling │ │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_theta │ 500000.0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings │ False │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ torch_dtype │ bfloat16 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ transformers_version │ 4.40.0.dev0 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ use_cache │ True │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ vocab_size │ 128256 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ name │ Meta-Llama-3-8B-Instruct │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ api │ mlc │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_path │ /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snaps │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ quant │ q4f16_ft │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ type │ llama │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_length │ 8192 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size │ -1 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ load_time │ 10.129087102 │
├─────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ params_size │ 3895.7578125 │
└─────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

Serving Flask app ‘nano_llm.web.server’
Debug mode: on
Exception in thread RivaASR:
Traceback (most recent call last):
File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner
self.run()
File “/opt/NanoLLM/nano_llm/plugins/speech/riva_asr.py”, line 117, in run
self.generate(self.audio_queue)
File “/opt/NanoLLM/nano_llm/plugins/speech/riva_asr.py”, line 134, in generate
for response in responses:
File “/usr/local/lib/python3.10/dist-packages/riva/client/asr.py”, line 387, in streaming_response_generator
for response in self.stub.StreamingRecognize(generator, metadata=self.auth.get_auth_metadata()):
File “/usr/local/lib/python3.10/dist-packages/grpc/_channel.py”, line 543, in next
return self._next()
File “/usr/local/lib/python3.10/dist-packages/grpc/_channel.py”, line 952, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = “failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Failed to connect to remote host: connect: Connection refused (111)”
debug_error_string = “UNKNOWN:Error received from peer {grpc_message:“failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:50051: Failed to connect to remote host: connect: Connection refused (111)”, grpc_status:14, created_time:“2025-03-18T09:07:17.493306563-07:00”}”

09:07:17 | INFO | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Running on all addresses (0.0.0.0)
Running on https://127.0.0.1:8050
Running on https://192.168.1.186:8050
09:07:17 | INFO | Press CTRL+C to quit

AastaLLL · March 20, 2025, 7:30am

Hi,

It looks like you are facing the same error as the below topic.
Please check for the suggestions accordingly.

THank.s

system · April 9, 2025, 3:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't run llamaspeak Jetson AGX Orin generative_ai	12	616	July 7, 2024
Jetson Container `Nano_llm` version 24.6-r36.2.0 error on Jepack 6.0 DP Jetson Orin NX containers , generative_ai	5	272	July 4, 2024
Use NanoLLM with Riva Speech Server for Spanish asr_models_languages_map: Failed Jetson Orin Nano generative_ai	15	115	June 16, 2025
Trouble running Llamaspeak on AGX Orin 64GB Jetson AGX Orin demos-and-tutorials , generative_ai	8	525	May 25, 2024
Lamaspeak is strangely silent! Jetson AGX Orin generative_ai	2	64	October 6, 2024
Nano_llm Whisper Jetson Orin Nano generative_ai	5	166	May 6, 2025
Jetson AI Lab - Home Assistant Integration Jetson Projects generative_ai	65	13875	August 1, 2025
Cannot run LLaVa with Orin NX Jetson Orin NX generative_ai	7	383	August 1, 2024
Whisper not working nano_llm Jetson AGX Orin generative_ai	6	175	October 28, 2024
WhisperASR can not initialize in nano_llm Agent studio Jetson AGX Orin generative_ai	4	172	February 24, 2025

WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize

DISPLAY environmental variable is already set: “:0”

Related topics