Local_llm vs NanoLLM: Help Getting NanoLLM up & running


I’m currently working with the Orin Nano dev kit and my goal is to get a video streaming demo working with any of the ViLA models.


  • I’ve successfully upgraded to Jetpack 6
  • Attempted to run the example (different VLM, but decided to stick to the script. The tool tip said that it’s still supported on my nano dev kit):
jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm --api=mlc \
    --model liuhaotian/llava-v1.6-vicuna-7b \
    --max-context-len 768 \
    --max-new-tokens 128

I got the following output:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 129, in _get_module_details
    spec = importlib.util.find_spec(mod_name)
  File "/usr/lib/python3.10/importlib/util.py", line 94, in find_spec
    parent = __import__(parent_name, fromlist=['__path__'])
ModuleNotFoundError: No module named 'local_llm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/__main__.py", line 9, in <module>
    runpy.run_module('local_llm.chat', run_name='__main__')
  File "/usr/lib/python3.10/runpy.py", line 220, in run_module
    mod_name, mod_spec, code = _get_module_details(mod_name)
  File "/usr/lib/python3.10/runpy.py", line 138, in _get_module_details
    raise error(msg.format(mod_name, type(ex).__name__, ex)) from ex
ImportError: Error while finding module specification for 'local_llm.chat' (ModuleNotFoundError: No module named 'local_llm')

I’m confused about the local_llm → NanoLLM transition. I thought that it was a single library that was renamed.

  • Is NanoLLM an evolution that has dependencies on the old local_llm which aren’t included in it’s install?
  • Is NanoLLM only part way through the transition and some of the examples don’t work?


A couple notes:

Will update once I hopefully get it working.

Hi @Ashis.Ghosh, if you are on Orin Nano, I would recommend trying Efficient-Large-Model/VILA-2.7b to start (I should update the docs to reflect this, and confirm again that the 7B VLMs still work in 8GB memory)

Alas you have uncovered some lingering references to local_llm in the NanoLLM code that I failed to update in the transition, sorry about that. It is indeed one library, and I will fix these errors. For now, see if you can run it like this:

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.chat --api=mlc \
    --model Efficient-Large-Model/VILA-2.7b \
    --max-context-len 768 \
    --max-new-tokens 128 \
    --prompt /data/prompts/images.json 

If that works, see if you can run liuhaotian/llava-v1.6-vicuna-7b or not.

1 Like

Thanks for the response @dusty_nv! Looks like I beat you to the punch by a small amount.

Happy to report that this demo did work!

Awesome thanks @Ashis.Ghosh, glad to hear it! I am in the process of updating MLC and NanoLLM now, and will fix these issues you encountered 👍

1 Like

Thanks @dusty_nv!

Also, unfortunately it looks like the 7b version of VILA is NOT working.

I made it all the way through quantization (~25 minutes) and then loaded up the video stream but it’s hanging while maxing out the 8GB RAM without any output. I’ve let it sit idle for about 10 minutes and will let it go for ~5 more before killing it to see if it somehow recovers. Let me know if there’s anything more I can try here.

I haven’t exactly compared their configs from HuggingFace to confirm this, but I thought both 7b were the same, so maybe it was already really tight on memory the first time? You also might want to try it in console-based chat mode without video first, just to confirm it loads.

Last night I fixed those lingering references to local_llm, and re-pushed the updated nano_llm containers:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.