Want to run a Local LLM on Nvidia Jetson AGX Orin

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this?

I was able to run a local LLM (.gguf model) over the CPU but unable to utilize the GPU.

Thank you in advance for your help!

You may check Jetson AI Lab - Home Assistant Integration - Jetson & Embedded Systems / Jetson Projects - NVIDIA Developer Forums

Hi @mausam.jain, we provide containers for llama.cpp, ollama, and text-generation-webui that were compiled with CUDA enabled in llama.cpp: https://github.com/dusty-nv/jetson-containers

We also have ollama and oobabooga tutorials on Jetson AI Lab that will run quantized GGUF models:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.