Can't run llamaspeak

Hi Dusty

I’m using same command as
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456
$(autotag nano_llm)
python3 -m nano_llm.agents.web_chat --api=mlc
–model meta-llama/Meta-Llama-3-8B-Instruct
–asr=riva --tts=piper

using my on Huggingface token. I get this error at the end of the above command:-

/opt/NanoLLM/nano_llm/utils/tensor.py:109: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/pytorch/torch/csrc/utils/tensor_numpy.cpp:206.)
return torch.from_numpy(tensor).to(device=device, dtype=convert_dtype(dtype, to=‘pt’), **kwargs)
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 310, in
agent = WebChat(vars(args))
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 55, in init
self.llm.functions = BotFunctions()
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 86, in new
cls.load(test=test)
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 265, in load
cls.test()
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 275, in test
logging.info(f"Bot function descriptions:\n{cls.generate_docs()}")
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 167, in generate_docs
docs = ‘\n’.join([’
’ + x.docs for x in cls.functions if x.enabled])
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 167, in
docs = ‘\n’.join([’
’ + x.docs for x in cls.functions if x.enabled])
TypeError: can only concatenate str (not “dict”) to str


Hi @paulrrh, instead of $(autotag nano_llm) for now can you try running dustynv/nano_llm:24.6-r36.2.0 instead?

Hi Dusty and thanks for reply

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxx
$(autotag nano_llm)
python3 -m nano_llm.agents.web_chat --api=mlc
–model meta-llama/Meta-Llama-3-8B-Instruct
–asr=riva --tts=piper
Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:24.6-r36.2.0

This is what happens at the end:-
14:05:10 | INFO | using chat template ‘llama-3’ for model Meta-Llama-3-8B-Instruct
14:05:10 | INFO | model ‘Meta-Llama-3-8B-Instruct’, chat template ‘llama-3’ stop tokens: [‘<|end_of_text|>’, ‘<|eot_id|>’] → [128001, 128009]
14:05:10 | INFO | Warming up LLM with query ‘What is 2+2?’
14:05:11 | INFO | Warmup response: ‘Easy peasy!\n\nThe answer to 2+2 is… 4!<|eot_id|>’
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 327, in
agent = WebChat(**vars(args))
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 32, in init
super().init(**kwargs)
File “/opt/NanoLLM/nano_llm/agents/voice_chat.py”, line 42, in init
self.vad = VADFilter(**kwargs).add(self.asr) if self.asr else None
File “/opt/NanoLLM/nano_llm/plugins/audio/vad_filter.py”, line 35, in init
self.vad = load_vad()
File “/opt/whisper_trt/whisper_trt/vad.py”, line 134, in load_vad
make_cache_dir()
File “/opt/whisper_trt/whisper_trt/cache.py”, line 36, in make_cache_dir
os.makedirs(_CACHE_DIR)
File “/usr/lib/python3.10/os.py”, line 225, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: ‘/root/.cache/whisper_trt’

Any clues. There is a file called whisper in the same dirctory.
I mv whisper_trt as whisper_trt2
then run the command again
whisper_trt2 disappeared but whisper_trt existed again.

I tried !!!
Cheers

Also tried
jetson-containers run dustynv/nano_llm:24.6-r36.2.0
then ran
python3 -m nano_llm.agents.video_stream --video-input /dev/video0 --video-output webrtc://@:8554/output

inside the container
this accessed the camera but could not access
webrtc://@:8554/output

OK sorry yea - I will go back through and re-confirm all the previous agents still working with the recent changes to support the new ASR/VAD models and pipelines for Agent Studio. For now, continue rolling back to previous versions of the nano_llm container that would have llamaspeak in a known-good state:

https://hub.docker.com/repository/docker/dustynv/nano_llm/tags

for example dustynv/nano_llm:24.5.1-r36.2.0 or dustynv/nano_llm:24.5-r36.2.0

Hi Dusty
Thanks again for your help.
Tried
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxxxx $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper

– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:24.5-r36.2.0

For example, if the user asks for the temperature, call the WEATHER() function.
14:08:00 | INFO | Testing bot functions:
14:08:00 | INFO | * SAVE() => ‘None’ (SAVE(“”) - save information about the user, for example SAVE(“Mary likes to garden”))
14:08:00 | INFO | * TIME() => ‘2:08 PM’ (TIME() - Returns the current time.)
14:08:00 | INFO | * DATE() => ‘Saturday, June 6 2024’ (DATE() - Returns the current date.)
14:08:00 | INFO | * LOCATION() => ‘Exeter, England’ (LOCATION() - Returns the current location, like the name of the city.)
14:08:00 | ERROR | Exception occurred testing bot function WEATHER()

Traceback (most recent call last):
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 242, in test
logging.info(f" * {function.name}() => ‘{function.function()}’ ({function.docs})“)
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/weather.py”, line 24, in WEATHER
raise ValueError(f”$ACCUWEATHER_KEY or $OPENWEATHER_KEY should be set to your respective API key to use weather data")
ValueError: $ACCUWEATHER_KEY or $OPENWEATHER_KEY should be set to your respective API key to use weather data

14:08:00 | ERROR | Exception occurred testing bot function WEATHER_FORECAST()

Traceback (most recent call last):
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/init.py”, line 242, in test
logging.info(f" * {function.name}() => ‘{function.function()}’ ({function.docs})“)
File “/opt/NanoLLM/nano_llm/plugins/bot_functions/weather.py”, line 52, in WEATHER_FORECAST
raise ValueError(f”$ACCUWEATHER_KEY or $OPENWEATHER_KEY should be set to your respective API key to use weather data")
ValueError: $ACCUWEATHER_KEY or $OPENWEATHER_KEY should be set to your respective API key to use weather data

14:08:00 | INFO | mounting webserver path /tmp/uploads to /uploads
14:08:00 | INFO | starting webserver @ https://0.0.0.0:8050
14:08:00 | SUCCESS | WebChat - system ready

  • Serving Flask app ‘nano_llm.web.server’
  • Debug mode: on
    14:08:00 | INFO | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on all addresses (0.0.0.0)
  • Running on https://127.0.0.1:8050
  • Running on https://192.168.1.215:8050
    14:08:00 | INFO | Press CTRL+C to quit

Now able to produce a llamaspeak screen

No TTS Voice
TTS Speaker
I am I think now out of my depth.
No response from the microphone although I knows the device the same as the speaker
Any Documentation to Help
Cheers in anticipation
paulrrh

OK, thanks for bearing with me there @paulrrh - I fixed the errors you previously ran into with fixes for llamaspeak web_agent · dusty-nv/NanoLLM@8845cf6 · GitHub, updated the NanoLLM container, and confirmed the llamaspeak agent to still be working (I tried it with Whisper ASR, so use --asr=whisper when running it)

First, do a sudo docker pull dustynv/nano_llm:r36.2.0 to update that image to the latest. Then run this to start container:

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxx \
  dustynv/nano_llm:r36.2.0 \
     python3 -m nano_llm.agents.web_chat --api=mlc --debug \
      --model meta-llama/Meta-Llama-3-8B-Instruct \
      --asr=whisper --tts=piper

If you are still unable to get web mic/speaker audio, what I would try next is using Agent Studio:

In addition to being able to customize the pipeline, this had the added benefit of visual feedback of the current audio levels and processing performance in the system - so you can see “oh, the audio is not coming through from the mic” or “the voice activity threshold is too high for the ASR to activate”.

If you also pull the latest jetson-containers (or run $(jetson-containers update)) presets for the ASR/TTS pipelines should appear in the tool (as shown on the tutorial page). You can also watch me laying out these pipelines step-by-step in the first ~15 minutes of this video - essentially re-creating the Llamaspeak agent:

Hi Dusty

Thanks for your reply. Will try the above on Thursday.
Keep up the good work.
Paulrrh

1 Like

Hi Dusty
ran

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxx
dustynv/nano_llm:r36.2.0
python3 -m nano_llm.agents.web_chat --api=mlc --debug
–model meta-llama/Meta-Llama-3-8B-Instruct
–asr=whisper --tts=piper

12:34:47 | INFO | loading Whisper model ‘base.en’ with TensorRT
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 327, in
agent = WebChat(**vars(args))
File “/opt/NanoLLM/nano_llm/agents/web_chat.py”, line 32, in init
super().init(**kwargs)
File “/opt/NanoLLM/nano_llm/agents/voice_chat.py”, line 38, in init
self.asr = AutoASR.from_pretrained(asr=asr, kwargs)
File “/opt/NanoLLM/nano_llm/plugins/speech/auto_asr.py”, line 38, in from_pretrained
return WhisperASR(
{**kwargs, ‘model’ : asr})
File “/opt/NanoLLM/nano_llm/plugins/speech/whisper_asr.py”, line 70, in init
self.model = load_trt_model(self.model_name, verbose=True)
File “/opt/whisper_trt/whisper_trt/model.py”, line 414, in load_trt_model
make_cache_dir()
File “/opt/whisper_trt/whisper_trt/cache.py”, line 36, in make_cache_dir
os.makedirs(_CACHE_DIR, exist_ok=True)
File “/usr/lib/python3.10/os.py”, line 225, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: ‘/root/.cache/whisper_trt’

Similar to my original error

If I remove --asr=whisper
I get llamaspeak screen as in my previous post

also Ran
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456
$(autotag nano_llm)
python3 -m nano_llm.studio

from audio I get correct input and output devices nothing else

from agent menu I get
:-
New,Save,Load,Insert,Clear Cache

Load does nothing as there is no files in non_llm/presets
The screen is nothing like the video
I have agent icon, audio Icon,Microphone and Speaker icon
Nothing else !!!
I’m not sure where to go from here.
Sorry for being a pain!
Paulrrh

Try updating your jetson-containers (either run jetson-containers update or doing a git pull in your jetson-containers directory). These presets get stored under your jetson-containers/data/nano_llm/presets

If the UI still doesn’t populate, please run it with python3 -m nano_llm.studio --debug and send me the logs.

Seeing as I was unable to reproduce this issue here, your comment above made me think your jetson-containers had not been updated, and the jetson-containers/data/whisper directory doesn’t exist for you (which /root/.cache/whisper_trt gets symlinked to). And in the corner case where link exists in the filesystem but it is broken, then Python still throws that exception I am finding (even with exist_ok=True)

Update your jetson-containers repo as I mentioned above, and it should start working. I checked this by renaming my jetson-containers/data/whisper directory, seeing the error you got, and then moving it back.

1 Like

Hi Dusty
See below git pull/got output below tried git fetch then git merge similar out as git pull
then---------

Ran with some success

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxx \
  dustynv/nano_llm:r36.2.0 \
     python3 -m nano_llm.agents.web_chat --api=mlc --debug \
      --model meta-llama/Meta-Llama-3-8B-Instruct \
      --asr=whisper --tts=piper

16:04:31 | INFO | using chat template ‘llama-3’ for model Meta-Llama-3-8B-Instruct
16:04:31 | INFO | model ‘Meta-Llama-3-8B-Instruct’, chat template ‘llama-3’ stop tokens: [‘<|end_of_text|>’, ‘<|eot_id|>’] → [128001, 128009]
16:04:31 | INFO | Warming up LLM with query ‘What is 2+2?’
16:04:31 | DEBUG | text embedding cache hit <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful and friendly AI assistant.<|eot_id|>
16:04:31 | DEBUG | chat embed entries=2 shape=(1, 31, 4096) position=0
16:04:31 | DEBUG | allocated new KV cache in 53.6 ms (existing cache refcounts=)
16:04:32 | INFO | Warmup response: ‘Easy peasy!\n\nThe answer to 2+2 is… 4!<|eot_id|>’
16:04:32 | DEBUG | connected ChatQuery to PrintStream on channel=0
16:04:32 | INFO | loading Whisper model ‘base.en’ with TensorRT
2024-07-04 16:04:35.932204405 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘131’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932259571 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘136’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932275826 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘139’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932287058 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘140’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932296401 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘134’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932399182 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘628’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932412237 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘623’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932421229 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘629’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932428876 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘620’. It is not used by any node and should be removed from the model.
2024-07-04 16:04:35.932436844 [W:onnxruntime:, graph.cc:3572 CleanUnusedInitializersAndNodeArgs] Removing initializer ‘625’. It is not used by any node and should be removed from the model.
16:04:36 | DEBUG | connected VADFilter to WhisperASR on channel=0
16:04:36 | DEBUG | connected WhisperASR to PrintStream on channel=0
16:04:36 | DEBUG | connected WhisperASR to PrintStream on channel=1
16:04:36 | DEBUG | connected WhisperASR to asr_partial on channel=1
16:04:36 | DEBUG | connected WhisperASR to asr_final on channel=0
16:04:36 | DEBUG | connected WhisperASR to ChatQuery on channel=0
16:04:36 | DEBUG | Downloading https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/voices.json to /data/models/piper/voices.json
16:04:36 | DEBUG | Loading /data/models/piper/voices.json
16:04:36 | INFO | loading Piper TTS model from /data/models/piper/en_US-libritts-high.onnx
2024-07-04 16:04:39.733214787 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 28 Memcpy nodes are added to the graph torch-jit-export for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-07-04 16:04:39.771005265 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-04 16:04:39.771073327 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
16:04:40 | WARNING | Piper TTS failed to set speaker to ‘None’, ignoring… (None)
16:04:40 | DEBUG | running Piper TTS model warm-up for en_US-libritts-high
16:04:40 | DEBUG | generating Piper TTS with en_US-libritts-high for ‘This is a test of the text to speech.’
16:04:41 | DEBUG | finished TTS request, streamed 45114 samples at 22.1KHz - 2.05 sec of audio in 1.00 sec (RTFX=2.0440)
16:04:41 | DEBUG | connected PiperTTS to RateLimit on channel=0
16:04:41 | DEBUG | connected ChatQuery to PiperTTS on channel=1
16:04:41 | DEBUG | connected UserPrompt to ChatQuery on channel=0
16:04:41 | DEBUG | connected WhisperASR to on_asr_partial on channel=1
16:04:41 | DEBUG | connected ChatQuery to on_llm_reply on channel=0
16:04:41 | DEBUG | connected RateLimit to on_tts_samples on channel=0
16:04:41 | INFO | mounting webserver path /tmp/uploads to /uploads
16:04:41 | DEBUG | webserver root directory: /opt/NanoLLM/nano_llm/web upload directory: /tmp/uploads
16:04:41 | INFO | starting webserver @ https://0.0.0.0:8050
16:04:41 | SUCCESS | WebChat - system ready

  • Serving Flask app ‘nano_llm.web.server’
  • Debug mode: on
    16:04:41 | INFO | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on all addresses (0.0.0.0)
  • Running on https://127.0.0.1:8050
  • Running on https://192.168.1.215:8050

STILL NO jetson-container/data/whisper

tried git pull
got the following response

/jetson-containers$ git pull
Updating cc0abb08…2e492ba9
error: Your local changes to the following files would be overwritten by merge:
README.md
jetson_containers/container.py
packages/audio/audiocraft/Dockerfile
packages/audio/voicecraft/Dockerfile
packages/audio/xtts/Dockerfile
packages/audio/xtts/README.md
packages/cuda/cuda/config.py
packages/diffusion/stable-diffusion-webui/Dockerfile
packages/diffusion/stable-diffusion-webui/install_extensions.sh
packages/jetson-inference/build.sh
packages/llm/awq/Dockerfile
packages/llm/awq/build.sh
packages/llm/awq/config.py
packages/llm/awq/quantize.py
packages/llm/mlc/README.md
packages/llm/mlc/docs.md
packages/llm/mlc/patches/607dc5a.diff
packages/llm/mlc/test.sh
packages/llm/nano_llm/Dockerfile
packages/llm/nano_llm/README.md
packages/llm/nano_llm/config.py
packages/llm/ollama/Dockerfile
packages/llm/ollama/README.md
packages/llm/tensorrt_llm/benchmark.sh
packages/llm/tensorrt_llm/build.sh
packages/llm/tensorrt_llm/config.py
packages/llm/tensorrt_llm/install.sh
packages/llm/tensorrt_llm/test.py
packages/llm/tensorrt_llm/test_llama.sh
packages/llm/xformers/Dockerfile
packages/numpy/Dockerfile
packages/openai-triton/Dockerfile
packages/openai-triton/build.sh
packages/openai-triton/config.py
packages/openai-triton/install.sh
packages/pytorch/torch2trt/Dockerfile
packages/rag/llama-index/Dockerfile
packages/rag/llama-index/README.md
packages/rag/llama-index/docs.md
packages/rag/llama-index/samples/LlamaIndex_Local-Models_L4T.ipynb
packages/ros/Dockerfile.ros.noetic
packages/ros/config.py
packages/vectordb/faiss_lite/Dockerfile
packages/vectordb/nanodb/Dockerfile
packages/vit/nanoowl/Dockerfile
packages/vit/nanoowl/README.md
packages/vit/nanoowl/docs.md
packages/vit/tam/Dockerfile
Please commit your changes or stash them before you merge.
error: The following untracked working tree files would be overwritten by merge:
packages/audio/whisper_trt/Dockerfile
packages/audio/whisper_trt/README.md
packages/audio/whisper_trt/test.py
packages/holoscan/Dockerfile
packages/holoscan/README.md
packages/holoscan/docs.md
packages/holoscan/test.sh
packages/llm/awq/install.sh
packages/llm/llama-factory/Dockerfile
packages/llm/llama-factory/README.md
packages/llm/llama-factory/test.py
packages/llm/mlc/benchmark.sh
packages/pytorch/torch2trt/patches/flattener.py
packages/rag/jetson-copilot/.streamlit/config.toml
packages/rag/jetson-copilot/Dockerfile
packages/rag/jetson-copilot/README.md
packages/rag/jetson-copilot/app.py
packages/rag/jetson-copilot/docs.md
packages/rag/jetson-copilot/static/jetson-soc.png
packages/rag/jetson-copilot/static/user-purple.png
packages/rag/llama-index/Dockerfile.samples
packages/rag/llama-index/config.py
packages/vectordb/nanodb/test.py
packages/vit/clip_trt/Dockerfile
packages/vit/clip_trt/README.md
packages/vit/clip_trt/test.py
packages/vit/nanoowl/test.sh
Please move or remove them before you merge.
Aborting

Sorry I made typo, I meant to say jetson-containers/data/models/whisper - I think it created that directory for you when you pulled jetson-containers, because WhisperASR loaded for you now.

1 Like

Hi Dusty
I have found my problem with the system . I have changed my default browser from firefox to chrome ah low and behold I have the correct output to the browser or at least one I can communicate with.
Now the real fun begins.
Thanks very much for your help

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.