Ollama LLM inference problems on Jetson Orin Nano: CUDA memory allocation failure and CPU memory error

chirag.lakshmipathi · March 16, 2026, 9:44am

Hardware:
Jetson Orin Nano 8GB

Software:
JetPack (Ubuntu 22.04)
Ollama running locally
Model: qwen2.5:1.5b-instruct (~986MB)

Problem:

When running the model with GPU enabled, the inference fails with:

error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

When forcing CPU inference, the model sometimes fails with a memory related error such as:
“index out of range / memory index error”

System state from tegrastats:
RAM ~3.8GB free
lfb ~23x4MB (largest free block = 4MB)

Observations:
The system has sufficient total RAM but very small contiguous memory blocks.
This suggests memory fragmentation due to Jetson’s shared CPU/GPU memory.

Questions:

Is there a recommended configuration for running LLMs on Jetson Orin Nano?
Is it possible to increase contiguous memory for CUDA allocations?
Are there Jetson-specific optimizations for llama.cpp / Ollama models?
Would using TensorRT-LLM or another inference backend help?

CPU inference works sometimes but is slow.
GPU inference consistently fails with CUDA memory allocation errors.

AastaLLL · March 17, 2026, 5:56am

Hi,

Could you check which software version you use?

$ cat /etc/nv_tegra_release

There is a known memory issue in r36.4.7 and you can get the fix after upgrading your device to JetPack 6.2.2/r36.5.

Thanks.

Topic		Replies	Views
CUDA0 Buffer Error Jetson Orin Nano cuda , jetson , llama	4	379	January 5, 2026
LLM not working getting error Jetson Orin Nano llm	2	41	March 18, 2026
Ollama errors orin nano Jetson Orin NX nvbugs , generative_ai	43	2388	March 23, 2026
Unable to load large models on Jetson Orin Nano Super despite sufficient RAM Jetson Orin Nano llm	6	571	October 28, 2025
Cuda0 Buffer Error Jetson Orin Nano cuda	12	1578	November 5, 2025
CUDA out of memory Jetson Orin Nano cuda	6	581	November 6, 2025
Jetson Orin Nano Live LLAVA implementation Jetson Orin Nano generative_ai	5	270	October 27, 2025
Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2) Jetson Orin Nano generative_ai	43	1144	February 25, 2026
How Jetson allocate memory for GPU? Jetson Orin Nano generative_ai	5	274	January 20, 2026
Jetson orin nano deploying yolo with llm Jetson Orin Nano llm	4	90	February 9, 2026

Ollama LLM inference problems on Jetson Orin Nano: CUDA memory allocation failure and CPU memory error

Related topics