How to run voice_chat agent of NanoLLM?

siyu_ok · August 2, 2024, 4:54pm

Hi,
I installed USB microphone and USB speaker, then I ran the following

sudo jetson-containers run -v ~/NanoLLM/:/opt/NanoLLM eb86 python3 -m nano_llm.agents.voice_chat --api mlc --model /data/models/phi-2 --quantization q4f16_ft --asr=whisper --tts=piper

It doesn’t seem to work, there’s no output when I spoke into the microphone.
How to run voice_chat agent?

siyu_ok · August 4, 2024, 1:06pm

@dusty_nv Can you help me with this? Thanks!

dusty_nv · August 5, 2024, 2:24pm

@siyu_ok I would use Agent Studio to setup the pipeline and visually inspect what is happening and independently test the ASR, LLM, TTS. You can also manually run some of the tests under nano_llm/test to confirm the ASR and TTS functionality first:

siyu_ok · August 5, 2024, 3:48pm

@dusty_nv Thank you for your information! I modified the pipeline of voice_chat, and now it worked. But a exception raised after a few conversations:

Exception in thread Thread-2 (_run):
Traceback (most recent call last):
File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner
self.run()
File “/usr/lib/python3.10/threading.py”, line 953, in run
self._target(*self._args, **self._kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 529, in _run
self._generate(stream)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 507, in _generate
prefill(self.embed_tokens([self.tokenizer.eos_token_id], return_tensors=‘tvm’), stream.kv_cache)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 283, in embed_tokens
raise RuntimeError(f"{self.config.name} does not have embed() in {self.module_path}")
RuntimeError: phi-2 does not have embed() in /data/models/mlc/dist/phi-2-ctx2048/phi-2-q4f16_ft/phi-2-q4f16_ft-cuda.so

Do you have any idea about the exception? Thanks!

dusty_nv · August 6, 2024, 1:47pm

Hi @siyu_ok, does this only occur after the chat history fills up, or is it with fresh chat? Can you try changing the --max-context-len to see if that alters the behavior?

siyu_ok · August 7, 2024, 10:02am

@dusty_nv the same issue occurred with --max-context-len=512.
So I tested the Llama-3-8B-Instruct, it worked well with no errors, so I think it’s probably a model-related problem.

dusty_nv · August 9, 2024, 8:56pm

OK gotcha @siyu_ok , thanks for letting me know. In that case, you might want to try it with a different LLM backend (like --api=hf). I have been meaning to upgrade the version of MLC/TVM this uses to pick up the latest fixes in that.

Topic		Replies	Views
Can't run llamaspeak Jetson AGX Orin generative_ai	12	843	July 7, 2024
NanoLLM Studio Error Jetson Orin Nano generative_ai , llama	2	132	February 17, 2025
Local_llm vs NanoLLM: Help Getting NanoLLM up & running Jetson Orin Nano generative_ai	7	1362	April 17, 2024
Jetson Container `Nano_llm` version 24.6-r36.2.0 error on Jepack 6.0 DP Jetson Orin NX containers , generative_ai	5	361	July 4, 2024
WhisperASR can not initialize in nano_llm Agent studio Jetson AGX Orin generative_ai	4	347	February 24, 2025
WARNING \| AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize Jetson Orin NX generative_ai	4	453	March 20, 2025
LLM on Jetson Nano 4GB B01 Jetson Nano conversational-ai , generative_ai	13	4823	August 12, 2024
How to use the jetson nano to give a human voice to chatgpt and to talk with it with my own voice (chatgpt should understand what I want to mean) Jetson Nano jetson-inference , generative_ai	3	3778	April 26, 2023
Error on following "NanoVLM - Efficient Multimodal Pipeline" Jetson Orin Nano generative_ai	2	309	May 24, 2024
Nano_LLM or nanollm for Python package? Jetson Orin Nano generative_ai , llama	8	328	May 15, 2025

How to run voice_chat agent of NanoLLM?

Related topics