Could someone provide some insight on implementing a RAG agent on a Jetson Nano 8GB. Our indexing requirements are relatively small so primarily intersted in optimal LLM query performance with a model in the 3B parameter range. We tried NanoLLm with MLC but are getting system hangs with the compatible Llama 8B models.
Hi,
Suppose you are using Orin Nano instead of Jetson Nano, is that correct?
For 8GB devices, we recommend trying some models with weight <= 4GB.
Below are two RAG-related tutorials for your reference:
LlamaIndex: LlamaIndex - NVIDIA Jetson AI Lab
Jetson Copilot: Jetson Copilot - NVIDIA Jetson AI Lab
Thanks.
Correct, we are using Orin Nano 8GB. We had originally built a solution with Ollama, but the latency at times was unacceptable with the Llama-3.2-3B model. No doubt we are asking a lot from the Nano, but just want to make sure we using the most optimized pipeline for token generation. From what I understand MLC compiles the model specifically for the platform so would potentially offer the best performance?
Hi,
Yes, we also recommend using MLC for LLMs on Jetson.
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.