Xavier AGX NanoLLM Compatible?!?!

I have been trying to load NanoLLM on Jetson Xavier AGX 16 GB.

Can anyone confirm this should actually work? It looks like the nanollm container supports R35 but when I have been trying most of the examples… I really want to try them with a fairly small model, which part of my issue may be I don’t understand the parameters I need to change for the different models. I have spent the better part of a week trying different options and looking back through the web documentation hoping I have just missed something… I really want to try them on the hardware I already own before I move forward on getting any new hardware…
I think it has the newest version of Jetpack…
$ cat /etc/nv_tegra_release

R35 (release), REVISION: 5.0, GCID: 35550185, BOARD: t186ref, EABI: aarch64, DATE: Tue Feb 20 04:46:31 UTC 2024

Package: nvidia-jetpack
Version: 5.1.3-b29
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-jetpack-runtime (= 5.1.3-b29), nvidia-jetpack-dev (= 5.1.3-b29)
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
Download-Size: 29.3 kB
APT-Sources: https://repo.download.nvidia.com/jetson/common r35.5/main arm64 Packages
Description: NVIDIA Jetpack Meta Package
|Distributor ID:|Ubuntu|

Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

When I load it I get:
$ jetson-containers run $(autotag nano_llm)
Namespace(disable=[‘’], output=‘/tmp/autotag’, packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], quiet=False, user=‘dustynv’, verbose=False)
– L4T_VERSION=35.5.0 JETPACK_VERSION=5.1 CUDA_VERSION=11.4
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r35.4.1
localuser:root being added to access control list

  • docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/xavier/Documents/Git/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/video2 --device /dev/video3 dustynv/nano_llm:r35.4.1

OK I think I found the error and a corresponding forum response from @dusty_nv… It looks like MLC does not support sm72

lots of error messaging

AssertionError: sm72 not supported yet.

dusty’s response on the other forum post:

“I believe MLC only supports SM80 and Orin due to the kernel optimizations used”

recommendation:

“…on Xavier I would use llama.cpp container instead, it gets the 2nd-best performance and supports quantization”

pivot to llama_cpp or purchase Jetson Orin gear…

Hi Kyle, yes unfortunately MLC only supports sm_80 and newer (hence those tutorials on Jetson AI Lab that use NanoLLM only list compatibility with Orin).

Since that post you found, exllamav2 has gotten faster than llama.cpp, but I’m not sure if exllama is limited to sm_80+ also (there are containers for that here)

And also since that post, there is now Ollama support on Jetson, which also uses llama.cpp underneath but is easier to use, so if you are starting out you may consider that as an option as well.