How to Deploy RAG application on Jetson Orin Nano Developer Kit

afolabirasaq · February 5, 2025, 7:53pm

Hello Everyone,

I have a RAG application that is working fine and because of the limited space in Jetson Orin Nano Developer Kit, I am using a quantized mistral 7b instruct model.

I would greatly appreciate it if someone can assist me on step by step ways of deploying the RAG application on the Jetson Orin Nano Developer Kit.

Thank you

DavidDDD · February 7, 2025, 5:21am

Hi,

There are the guideline in jetson-container
Please refer to below link

Thanks

afolabirasaq · February 14, 2025, 11:26pm

Hi,

I have deployed the model on the edge device. However, I am getting the error below each time I tried to run the RAG application. I would be glad if someone can assist me with this.

RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information: python -m bitsandbytes Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: GitHub · Where software is built

Although, each time I cd into bitsanbytes and I ran the code: python -m bitsandbytes

I got this:
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = [‘8.7’]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
+ library is importable
+ CUDA function is callable

WARNING: Please be sure to sanitize sensible info from any such env vars!

SUCCESS!
Installation was successful!

Topic		Replies	Views
Orin Nano - Building TensorRT-LLM from source Jetson Orin Nano tensorrt , cuda , llama	9	798	November 17, 2025
Current optimal container stack for RAG on Nano 8GB Jetson Nano generative_ai , llama	4	542	March 31, 2025
Seeking Best Practices for Deploying Efficient RAG Systems on NVIDIA Jetson Edge Devices Jetson AGX Orin llm	2	64	March 16, 2026
How to install torch_tensorrt Jetson Orin Nano tensorrt	3	301	August 20, 2025
Issue with CUDA Availability on Jetson Orin AGX 64GB in Container dustynv/nano_llm:r36.2.0 Jetson AGX Orin cuda , containers , generative_ai	2	138	December 5, 2024
AI Models That Run on Jetson Orin Nano Super (8GB) — A Practical Guide Jetson Orin Nano jetson-inference , jetson , generative_ai , llm , cosmos , nemotron , nemoclaw , openclaw	6	4346	May 2, 2026
AcceleratorError: CUDA error: no kernel image is available for execution on the device Jetson Orin Nano cuda , jetson-inference	3	353	February 9, 2026
Examples for Deployment of and Inference with Pretrained Custom PyTorch-Based Models on Jetson Orin Nano Jetson Orin NX pytorch	13	738	May 25, 2025
Bitsandbytes on Nvidia Jetson AGX Orin Jetson AGX Orin gpu	3	573	July 29, 2025
Deploying LLMs Jetson Orin NX generative_ai	2	403	May 8, 2025

How to Deploy RAG application on Jetson Orin Nano Developer Kit

Related topics