Deploying LLMs

Hi,

I’m working on deploying LLM models with Retrieval-Augmented Generation (RAG) for educational purposes at the edge. Initially, we need a single working prototype, and after successful validation, we plan to place an order for 200 such units.

Our development setup currently uses Ollama with LangChain and ChromaDB on Ubuntu 22.04 with an RTX 4090 GPU. The application is designed for server environments, but we now need to adapt it for edge deployment.

Given our budget, the NVIDIA Jetson Orin NX 16GB is a feasible option. I would like to know:

  • Does the Jetson Orin NX 16GB support installing and running LangChain and ChromaDB via standard pip install, as in regular Ubuntu?
  • Or is it necessary to use nvidia-docker / nvidia-container for proper support and performance?

Any guidance or insights on how best to set up this stack on Jetson Orin NX would be greatly appreciated.

Thank you!

Hi,

We have several prebuilt in the link below but unfortunately, we don’t have such two packages:

But ideally, you can build it from the source.
To access GPU on Jetson, you will need to set up nvidia-container-toolkit.

We also have an example that demonstrates the RAG feature.
Please find the link below:

Thanks.

1 Like