NanoLLM: How to use the local model

siyu_ok · July 16, 2024, 3:54pm

Hi,
I have downloaded the phi-2 model to local disk, and I tried to run NanoLLM chat using the local model path as following:

python3 -m nano_llm.chat --api mlc \
  --model /root/phi-2/ \
  --quantization q4f16_ft

I got the following error:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/opt/NanoLLM/nano_llm/__init__.py", line 2, in <module>
    from .nano_llm import NanoLLM
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 14, in <module>
    from .vision import CLIPVisionModel, MMProjector
  File "/opt/NanoLLM/nano_llm/vision/__init__.py", line 3, in <module>
    from .clip import CLIPVisionModel
  File "/opt/NanoLLM/nano_llm/vision/clip.py", line 2, in <module>
    from clip_trt import CLIPVisionModel
  File "/opt/clip_trt/clip_trt/__init__.py", line 2, in <module>
    from .text import CLIPTextModel
  File "/opt/clip_trt/clip_trt/text.py", line 10, in <module>
    import torch2trt
  File "/usr/local/lib/python3.10/dist-packages/torch2trt/__init__.py", line 1, in <module>
    from .torch2trt import *
  File "/usr/local/lib/python3.10/dist-packages/torch2trt/torch2trt.py", line 2, in <module>
    import tensorrt as trt
  File "/usr/lib/python3.10/dist-packages/tensorrt/__init__.py", line 67, in <module>
    from .tensorrt import *
ImportError: /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so: file too short

So how do I use the local path when running nanollm? Thx!

dusty_nv · July 16, 2024, 5:52pm

Hi @siyu_ok - this error is unrelated to you running path to local model - it would instead seem to be a docker issue with mounting your drivers when --runtime nvidia is used. Are you able to run python3 -c 'import tensorrt' in other containers, like nvcr.io/nvidia/l4t-jetpack:r36.3.0 ? (presuming you are on JetPack 6)

BTW, to access your local model from inside the container, you will either want to store it under your jetson-containers/data/models directory (which is automatically mounted, and you would refer to it like /data/models/phi-2 inside the container) - or you can mount your own directory into the container when you start it:

# mount ~/my_models into the container under /models
jetson-containers run -v ~/my_models:/models $(autotag nano_llm)

siyu_ok · July 17, 2024, 1:31am

Hi @dusty_nv Thanks for your reply! The same error when I run python3 -c 'import tensorrt' in the container of nvcr.io/nvidia/l4t-jetpack:r36.3.0
How can I solve the issue? Thx!

dusty_nv · July 17, 2024, 9:48pm

OK gotcha - did you upgrade this device from JetPack 5 via apt? You are in fact on JetPack 6 right? You might try reinstalling the nvidia-container* packages from apt - or baring that, reflashing the device and confirming that GPU works for you in container with a fresh install.

Does the nvidia runtime show up for you under docker info ?

$ docker info | grep nvidia
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: nvidia

Also you should have these files installed that mount the driver components into the container when --runtime nvidia is used:

ls -ll /etc/nvidia-container-runtime/host-files-for-container.d/
total 20
-rw-r--r-- 1 root root   995 Apr 24 23:05 devices.csv
-rw-r--r-- 1 root root 15806 Apr 24 23:05 drivers.csv

siyu_ok · July 22, 2024, 3:04pm

@dusty_nv Thanks for your tips. I have resolved the issue above by reflashing the device and reinstalling nvidia-container*.
Now, When I run

jetson-containers run eb86 python3 -m nano_llm.agents.voice_chat --api mlc --model /data/models/phi-2 --quantization q4f16_ft --asr=whisper --tts=piper

(eb86 is the ID of dustynv/nano_llm:24.7-r36.2.0)

I got an error as following:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 119, in <module>
    agent = VoiceChat(**vars(args)).run()
  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 56, in __init__
    self.tts = AutoTTS.from_pretrained(tts=tts, **kwargs)
  File "/opt/NanoLLM/nano_llm/plugins/speech/auto_tts.py", line 66, in from_pretrained
    return PiperTTS(**kwargs)
  File "/opt/NanoLLM/nano_llm/plugins/speech/piper_tts.py", line 57, in __init__
    self.voices_info = get_voices(self.cache_path, update_voices=True)
  File "/usr/local/lib/python3.10/dist-packages/piper/download.py", line 34, in get_voices
    with urlopen(voices_url) as response, open(
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

I downloaded the en_US-libritts-high.onnx and en_US-libritts-high.onnx.json in jetson-containers/data/models/piper/, and export PIPER_CACHE jetson-containers/data/models/piper/.
How can I solve it? Thx!

dusty_nv · July 24, 2024, 3:39am

It would seem in your case it never is able to connect to download the voices, or was it just a temporary networking issue? If the former, you may need to go into the NanoLLM code there and try changing it update_voices=False

You can clone the NanoLLM sources outside of container and mount them in, like shown here:

https://www.jetson-ai-lab.com/agent_studio.html#dev-mode

system · August 14, 2024, 6:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom docker nano llm live problem Jetson Orin Nano docker , generative_ai	3	154	September 6, 2024
nanoLLM on Docker Jetson Orin Nano generative_ai	10	419	September 10, 2024
Local_llm vs NanoLLM: Help Getting NanoLLM up & running Jetson Orin Nano generative_ai	7	1336	April 17, 2024
Nano LLM container error Jetson AGX Orin generative_ai , llm	2	301	August 14, 2024
Nano_LLM or nanollm for Python package? Jetson Orin Nano generative_ai , llama	8	305	May 15, 2025
Running NanoLLM Docker on Jetson Orin Nano FileNotFoundError Jetson Orin Nano generative_ai , llama	5	348	April 9, 2025
Jetson Container `Nano_llm` version 24.6-r36.2.0 error on Jepack 6.0 DP Jetson Orin NX containers , generative_ai	5	353	July 4, 2024
Run nano_llm problem TensorRT jetson , llama3-8b-instruct , llama	0	67	January 1, 2025
Trt_pose model in docker: ImportError: libnvmedia_tensor.so: cannot open shared object file: No such file or directory Jetson Nano tensorrt , dla	7	1089	May 3, 2023
NanoVLM Issue on Jetson Orin Nano Jetson Orin Nano generative_ai	9	896	June 6, 2024

NanoLLM: How to use the local model

Related topics