NanoLLM: How to use the local model

Hi,
I have downloaded the phi-2 model to local disk, and I tried to run NanoLLM chat using the local model path as following:

python3 -m nano_llm.chat --api mlc \
  --model /root/phi-2/ \
  --quantization q4f16_ft

I got the following error:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/opt/NanoLLM/nano_llm/__init__.py", line 2, in <module>
    from .nano_llm import NanoLLM
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 14, in <module>
    from .vision import CLIPVisionModel, MMProjector
  File "/opt/NanoLLM/nano_llm/vision/__init__.py", line 3, in <module>
    from .clip import CLIPVisionModel
  File "/opt/NanoLLM/nano_llm/vision/clip.py", line 2, in <module>
    from clip_trt import CLIPVisionModel
  File "/opt/clip_trt/clip_trt/__init__.py", line 2, in <module>
    from .text import CLIPTextModel
  File "/opt/clip_trt/clip_trt/text.py", line 10, in <module>
    import torch2trt
  File "/usr/local/lib/python3.10/dist-packages/torch2trt/__init__.py", line 1, in <module>
    from .torch2trt import *
  File "/usr/local/lib/python3.10/dist-packages/torch2trt/torch2trt.py", line 2, in <module>
    import tensorrt as trt
  File "/usr/lib/python3.10/dist-packages/tensorrt/__init__.py", line 67, in <module>
    from .tensorrt import *
ImportError: /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so: file too short

So how do I use the local path when running nanollm? Thx!

Hi @siyu_ok - this error is unrelated to you running path to local model - it would instead seem to be a docker issue with mounting your drivers when --runtime nvidia is used. Are you able to run python3 -c 'import tensorrt' in other containers, like nvcr.io/nvidia/l4t-jetpack:r36.3.0 ? (presuming you are on JetPack 6)

BTW, to access your local model from inside the container, you will either want to store it under your jetson-containers/data/models directory (which is automatically mounted, and you would refer to it like /data/models/phi-2 inside the container) - or you can mount your own directory into the container when you start it:

# mount ~/my_models into the container under /models
jetson-containers run -v ~/my_models:/models $(autotag nano_llm)

Hi @dusty_nv Thanks for your reply! The same error when I run python3 -c 'import tensorrt' in the container of nvcr.io/nvidia/l4t-jetpack:r36.3.0
How can I solve the issue? Thx!

OK gotcha - did you upgrade this device from JetPack 5 via apt? You are in fact on JetPack 6 right? You might try reinstalling the nvidia-container* packages from apt - or baring that, reflashing the device and confirming that GPU works for you in container with a fresh install.

Does the nvidia runtime show up for you under docker info ?

$ docker info | grep nvidia
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: nvidia

Also you should have these files installed that mount the driver components into the container when --runtime nvidia is used:

ls -ll /etc/nvidia-container-runtime/host-files-for-container.d/
total 20
-rw-r--r-- 1 root root   995 Apr 24 23:05 devices.csv
-rw-r--r-- 1 root root 15806 Apr 24 23:05 drivers.csv

@dusty_nv Thanks for your tips. I have resolved the issue above by reflashing the device and reinstalling nvidia-container*.
Now, When I run

jetson-containers run eb86 python3 -m nano_llm.agents.voice_chat --api mlc --model /data/models/phi-2 --quantization q4f16_ft --asr=whisper --tts=piper

(eb86 is the ID of dustynv/nano_llm:24.7-r36.2.0)

I got an error as following:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 119, in <module>
    agent = VoiceChat(**vars(args)).run()
  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 56, in __init__
    self.tts = AutoTTS.from_pretrained(tts=tts, **kwargs)
  File "/opt/NanoLLM/nano_llm/plugins/speech/auto_tts.py", line 66, in from_pretrained
    return PiperTTS(**kwargs)
  File "/opt/NanoLLM/nano_llm/plugins/speech/piper_tts.py", line 57, in __init__
    self.voices_info = get_voices(self.cache_path, update_voices=True)
  File "/usr/local/lib/python3.10/dist-packages/piper/download.py", line 34, in get_voices
    with urlopen(voices_url) as response, open(
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

I downloaded the en_US-libritts-high.onnx and en_US-libritts-high.onnx.json in jetson-containers/data/models/piper/, and export PIPER_CACHE jetson-containers/data/models/piper/.
How can I solve it? Thx!

It would seem in your case it never is able to connect to download the voices, or was it just a temporary networking issue? If the former, you may need to go into the NanoLLM code there and try changing it update_voices=False

You can clone the NanoLLM sources outside of container and mount them in, like shown here:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.