We’ve released new VILA models with improved accuracy and speed - up to 7.5 FPS on Orin!
These are supported in the latest 24.5 release of NanoLLM:
If you already have the nano_llm
container on your system, do a docker pull dustynv/nano_llm:r36.2.0
(or r35.4.1) and then you should be able to run this along with the other VLM demos:
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm.chat --api=mlc \
--model Efficient-Large-Model/VILA1.5-3b \
--prompt /data/prompts/images.json
It now also uses TensorRT to accelerate the CLIP/SigLIP vision encoder in the pipeline 👍