How to run voice_chat agent of NanoLLM?

Hi,
I installed USB microphone and USB speaker, then I ran the following

sudo jetson-containers run -v ~/NanoLLM/:/opt/NanoLLM eb86 python3 -m nano_llm.agents.voice_chat --api mlc --model /data/models/phi-2 --quantization q4f16_ft --asr=whisper --tts=piper

It doesn’t seem to work, there’s no output when I spoke into the microphone.
How to run voice_chat agent?

@dusty_nv Can you help me with this? Thanks!

@siyu_ok I would use Agent Studio to setup the pipeline and visually inspect what is happening and independently test the ASR, LLM, TTS. You can also manually run some of the tests under nano_llm/test to confirm the ASR and TTS functionality first:

@dusty_nv Thank you for your information! I modified the pipeline of voice_chat, and now it worked. But a exception raised after a few conversations:

Exception in thread Thread-2 (_run):
Traceback (most recent call last):
File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner
self.run()
File “/usr/lib/python3.10/threading.py”, line 953, in run
self._target(*self._args, **self._kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 529, in _run
self._generate(stream)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 507, in _generate
prefill(self.embed_tokens([self.tokenizer.eos_token_id], return_tensors=‘tvm’), stream.kv_cache)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 283, in embed_tokens
raise RuntimeError(f"{self.config.name} does not have embed() in {self.module_path}")
RuntimeError: phi-2 does not have embed() in /data/models/mlc/dist/phi-2-ctx2048/phi-2-q4f16_ft/phi-2-q4f16_ft-cuda.so

Do you have any idea about the exception? Thanks!

Hi @siyu_ok, does this only occur after the chat history fills up, or is it with fresh chat? Can you try changing the --max-context-len to see if that alters the behavior?

@dusty_nv the same issue occurred with --max-context-len=512.
So I tested the Llama-3-8B-Instruct, it worked well with no errors, so I think it’s probably a model-related problem.

OK gotcha @siyu_ok , thanks for letting me know. In that case, you might want to try it with a different LLM backend (like --api=hf). I have been meaning to upgrade the version of MLC/TVM this uses to pick up the latest fixes in that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.