I received Jetson Nano Super Dev Kit (interestingly, the package still says Jetson Nano Dev Kit). I upgraded firmware to 36.4.2 and JetPack version 6.1, and also set power mode to MAXN. I wanted to tryout VILA-2.7b but it exhausted memory and caused dev kit to reboot. Here is the output to load container and nano_llm r36.4.0:
=====
czhu@jnanosp-01:~$ jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 32 --max-new-tokens 4
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.4.2 JETPACK_VERSION=5.1 CUDA_VERSION=12.6
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:
+ docker run --runtime nvidia -it --rm --network host --shm-size=4g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/czhu/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20241230_012156 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 32 --max-new-tokens 4
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Fetching 10 files: 0%| | 0/10 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Fetching 10 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [00:00<00:00, 15044.13it/s]
Fetching 12 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 12/12 [00:00<00:00, 30156.77it/s]
01:22:32 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
01:22:46 | INFO | NumExpr defaulting to 6 threads.
01:22:46 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
['/data/models/mlc/dist/VILA-2.7b/ctx32/VILA-2.7b-q4f16_ft/mlc-chat-config.json', '/data/models/mlc/dist/VILA-2.7b/ctx32/VILA-2.7b-q4f16_ft/params/mlc-chat-config.json']
01:22:51 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 32 --artifact-path /data/models/mlc/dist/VILA-2.7b/ctx32 --use-safetensors
Using path "/data/models/mlc/dist/models/VILA-2.7b" for model "VILA-2.7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|โโ | 2/197 [00:03<05:10, 1.59s/tensors]
Set new param: 0%|โ | 1/327 [00:03<20:38, 3.80s/tensors]
=====