How to Run NVILA-8B Model with NanoLLM on Jetson AGX Orin?

Hello

I’m trying to use the NVILA-8B model from the Efficient-Large-Model repository(Efficient-Large-Model/NVILA-8B · Hugging Face) on a Jetson AGX Orin with NanoLLM. However, when I run the following command, the model fails to start:

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.chat --api=mlc \
    --model Efficient-Large-Model/NVILA-8b

I suspect there isn’t a Docker image that currently includes support for NVILA-8B out of the box.
Does anyone know if there is a prebuilt Docker image that can run NVILA-8B on Jetson AGX Orin, or how to build/configure one so that NVILA-8B can be used with the MLC back end? Any help or instructions would be greatly appreciated.

Thank you!

Hi,

Could you share the error message with us?
There is a known docker issue due to the recent docker 28.0.0 release.

You can find more info in the below comment:

Thanks.

Thank you for your response. Here is an overview of the error.

Error Overview

Inside the jetson-containers, I ran the following command:


python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/NVILA-8b

Steps and Issues Encountered:

1.An error occurred stating that mlc_llm.build does not support the quen2 model.

2.I resolved this issue by upgrading mlc_llm from version 0.1.0 to 0.19.0

pip install mlc-llm --upgrade

3.After that, I encountered an incompatibility error between mlc_llm,awq, and tvm, so I upgraded them as well


pip install awq --upgrade

pip install tvm --upgrade

4.Then, I ran the same command again:


python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/NVILA-8b

5.However, this time, I received an error stating that mlc_llm.build command was not found.

6.Upon checking mlc_llm version 0.19.0, I noticed that build.py is no longer present, and I am unsure how to build NVILA-8b with the new version.

Request:

Could you provide guidance on how to build NVILA-8b with the new version?

Alternatively, is there an updated Docker image available for it?

Thanks.

Did you have a soluation for this? Or did you manage to run NVILA in another way?

No, I don’t have any solutions yet.

Hi,

We test NVILA with nano_llm and meet some support error like below:

# python3 -m nano_llm.chat --model Efficient-Large-Model/NVILA-8B --api=mlc 
...
07:55:59 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/NVILA-8B --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 32768 --artifact-path /data/models/mlc/dist/NVILA-8B/ctx32768 --use-safetensors 


Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 47, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/build.py", line 41, in main
    parsed_args = core._parse_args(parsed_args)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 444, in _parse_args
    parsed = _setup_model_path(parsed)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 494, in _setup_model_path
    validate_config(args.model_path)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/core.py", line 538, in validate_config
    config["model_type"] in utils.supported_model_types
AssertionError: Model type qwen2 not supported.

We will test this with the latest mlc_llm release and provide more info to you later.
Thanks.

1 Like

Hi, all

Thanks for your patience.
For NVILA, please try the server.py included in this container image: dustynv/vila:r36.4.0-cu128-24.04

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.