CUDA0 Buffer Error

william.h.wagg · January 5, 2026, 1:04am

My Jetson Orin Nano Super (8GB) Running Jetpack 6.2 is experiencing issues with loading ollama models that are small enough for the Jetson, such as DeepSeek-R1:1.5B.

The main error is:

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

I am running Ollama from the install.sh script ollama provides:
curl -fsSL https://ollama.com/install.sh | sh

This is what my tegrastats looks like:
01-04-2026 19:47:49 RAM 1355/7620MB (lfb 14x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 1%@2133 GR3D_FREQ 0%@[306] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.062C soc2@43.031C soc0@43.75C gpu@43.75C tj@45.343C soc1@45.343C VDD_IN 6738mW/6738mW VDD_CPU_GPU_CV 477mW/477mW VDD_SOC 3065mW/3065mW

R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025

lfb only shows 56mb of contiguous memory (14x4mb blocks)
The power mode is set to MAXN SUPER when trying to load the models
I have reinstalled JetPack, and it still fails to allocate the CUDA0 Buffer
My Jetson is also headless (multi-user.target)
It seems like CUDA memory allocation is broken in 36.4.x.
Is there a more stable JetPack version for CUDA inference?

carolyuu · January 5, 2026, 1:21am

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, it appears that you are experiencing issues with loading Ollama models on your Jetson Orin Nano Super 8GB running Jetpack 6.2. The main error is “Error 500 Internal Server Error: llama runner process has terminated, error loading model, unable to allocate CUDA0 buffer, llama::ModelLoadFromFileImpl failed to load model.”

After reviewing the provided information, here are some potential causes and solutions:

Memory allocation issue: The tegrastats output shows that the contiguous memory available is only 56MB, which might be insufficient for loading the Ollama models. You can try to optimize the memory allocation by reducing the number of concurrent processes or closing other resource-intensive applications.
Power mode: You have set the power mode to MAXN_SUPER, which should provide the maximum performance. However, you can try to switch to a different power mode to see if it resolves the issue.
Jetpack version: You have reinstalled Jetpack 6.2, but it’s possible that there are issues with CUDA memory allocation in this version. You can try to downgrade to a previous version of Jetpack, such as Jetpack 5.1.3, which might be more stable for CUDA inference.
CUDA buffer allocation: The error message suggests that there is an issue with allocating the CUDA0 buffer. You can try to increase the CUDA buffer size or optimize the CUDA buffer allocation.
Model size: Although you mentioned that the Ollama models are small enough for the Jetson, it’s possible that the models are still too large for the available memory. You can try to reduce the model size or use model pruning techniques to reduce the memory requirements.
Headless and multi-user target: You mentioned that your Jetson is headless and multi-user target. You can try to disable the multi-user target or switch to a single-user target to see if it resolves the issue.

In comparison to other Jetson models, the Jetson Orin Nano Super 8GB has a relatively small amount of memory. The Jetson Orin NX 16GB, for example, has more memory and might be better suited for running Ollama models.

Here are some key specifications for the Jetson Orin NX 16GB:

1024 NVIDIA CUDA cores
32 Tensor cores
16GB of memory
Maximum operating frequency: 918 MHz (MAXN_SUPER)
CUDA core performance: 1.88 FP32 TFLOPs, 3.76 FP16 TFLOPs

In contrast, the Jetson Orin Nano Super 8GB has:

1024 NVIDIA CUDA cores
32 Tensor cores
8GB of memory
Maximum operating frequency: 765 MHz (MAXN_SUPER)
CUDA core performance: 1.56 FP32 TFLOPs, 3.12 FP16 TFLOPs

Overall, it’s possible that the Jetson Orin Nano Super 8GB is not sufficient for running Ollama models, and you might need to consider using a more powerful Jetson model or optimizing the model size and memory allocation.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

william.h.wagg · January 5, 2026, 2:02am

When I am trying to load the models, I don’t have anything else running, I have tried to load in other power modes and it gives the same CUDA0 buffer error. How can I downgrade to JetPack 5.1.3? How can I try to increase the CUDA buffer size? I know the model is small enough to load, since it used to work. I disabled the GUI to free up more Memory, I still experience this issue with the GUI enabled. Is there no solution to the CUDA0 Buffer issue in JetPack 6.x? There are so many entries for this issue, why is there still no solution?

AastaLLL · January 5, 2026, 2:23am

Hi,

This is a known regression on r36.4.7.
The root cause and fix can be found in the following link:

There will be an official release in 2026 Q1 to solve the memory issue.

Thanks.

system · February 4, 2026, 3:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cuda0 Buffer Error Jetson Orin Nano cuda	12	1621	November 5, 2025
Ollama errors orin nano Jetson Orin NX nvbugs , generative_ai	43	2505	March 23, 2026
Ollama LLM inference problems on Jetson Orin Nano: CUDA memory allocation failure and CPU memory error Jetson Nano llama	2	55	March 17, 2026
LLM not working getting error Jetson Orin Nano llm	3	46	March 18, 2026
Unable to load large models on Jetson Orin Nano Super despite sufficient RAM Jetson Orin Nano llm	6	604	October 28, 2025
CUDA out of memory Jetson Orin Nano cuda	6	602	November 6, 2025
"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages Jetson Orin Nano cuda , jetson , generative_ai , llama	245	15020	March 13, 2026
Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2) Jetson Orin Nano generative_ai	43	1158	February 25, 2026
Updating Orin Nano breaks Ollama Jetson Orin Nano cuda , generative_ai	26	1468	December 11, 2025
Gemma4 e4b on Jetson Orin Nano fails due to CUDA out of memory issue Jetson Orin Nano llama	5	357	April 8, 2026

CUDA0 Buffer Error

Related topics