My Jetson Orin Nano Super (8GB) Running Jetpack 6.2 is experiencing issues with loading ollama models that are small enough for the Jetson, such as DeepSeek-R1:1.5B.
The main error is:
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
This is what my tegrastats looks like:
01-04-2026 19:47:49 RAM 1355/7620MB (lfb 14x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 1%@2133 GR3D_FREQ 0%@[306] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.062C soc2@43.031C soc0@43.75C gpu@43.75C tj@45.343C soc1@45.343C VDD_IN 6738mW/6738mW VDD_CPU_GPU_CV 477mW/477mW VDD_SOC 3065mW/3065mW
lfb only shows 56mb of contiguous memory (14x4mb blocks)
The power mode is set to MAXN SUPER when trying to load the models
I have reinstalled JetPack, and it still fails to allocate the CUDA0 Buffer
My Jetson is also headless (multi-user.target)
It seems like CUDA memory allocation is broken in 36.4.x.
Is there a more stable JetPack version for CUDA inference?
*** Please note that this reply is generated by LLM automatically ***
Based on the provided context, it appears that you are experiencing issues with loading Ollama models on your Jetson Orin Nano Super 8GB running Jetpack 6.2. The main error is “Error 500 Internal Server Error: llama runner process has terminated, error loading model, unable to allocate CUDA0 buffer, llama::ModelLoadFromFileImpl failed to load model.”
After reviewing the provided information, here are some potential causes and solutions:
Memory allocation issue: The tegrastats output shows that the contiguous memory available is only 56MB, which might be insufficient for loading the Ollama models. You can try to optimize the memory allocation by reducing the number of concurrent processes or closing other resource-intensive applications.
Power mode: You have set the power mode to MAXN_SUPER, which should provide the maximum performance. However, you can try to switch to a different power mode to see if it resolves the issue.
Jetpack version: You have reinstalled Jetpack 6.2, but it’s possible that there are issues with CUDA memory allocation in this version. You can try to downgrade to a previous version of Jetpack, such as Jetpack 5.1.3, which might be more stable for CUDA inference.
CUDA buffer allocation: The error message suggests that there is an issue with allocating the CUDA0 buffer. You can try to increase the CUDA buffer size or optimize the CUDA buffer allocation.
Model size: Although you mentioned that the Ollama models are small enough for the Jetson, it’s possible that the models are still too large for the available memory. You can try to reduce the model size or use model pruning techniques to reduce the memory requirements.
Headless and multi-user target: You mentioned that your Jetson is headless and multi-user target. You can try to disable the multi-user target or switch to a single-user target to see if it resolves the issue.
In comparison to other Jetson models, the Jetson Orin Nano Super 8GB has a relatively small amount of memory. The Jetson Orin NX 16GB, for example, has more memory and might be better suited for running Ollama models.
Here are some key specifications for the Jetson Orin NX 16GB:
1024 NVIDIA CUDA cores
32 Tensor cores
16GB of memory
Maximum operating frequency: 918 MHz (MAXN_SUPER)
CUDA core performance: 1.88 FP32 TFLOPs, 3.76 FP16 TFLOPs
In contrast, the Jetson Orin Nano Super 8GB has:
1024 NVIDIA CUDA cores
32 Tensor cores
8GB of memory
Maximum operating frequency: 765 MHz (MAXN_SUPER)
CUDA core performance: 1.56 FP32 TFLOPs, 3.12 FP16 TFLOPs
Overall, it’s possible that the Jetson Orin Nano Super 8GB is not sufficient for running Ollama models, and you might need to consider using a more powerful Jetson model or optimizing the model size and memory allocation.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
When I am trying to load the models, I don’t have anything else running, I have tried to load in other power modes and it gives the same CUDA0 buffer error. How can I downgrade to JetPack 5.1.3? How can I try to increase the CUDA buffer size? I know the model is small enough to load, since it used to work. I disabled the GUI to free up more Memory, I still experience this issue with the GUI enabled. Is there no solution to the CUDA0 Buffer issue in JetPack 6.x? There are so many entries for this issue, why is there still no solution?