"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages

I confirm that in version r36.4.4, I was able to run both containers (Ollama and Open-Webui) on the machine (Jetson Orin Nano Super Developer Kit). Like @jetson15, I even ran several containers at the same time (n8n, ollama, open-webui, whisper) without any problems.

In r36.4.7, I ran three tests today:

- Everything in Docker: error

Ollama_docker_20251030.log (15.0 KB)

- Ollama natively + Open-webui in Docker: error

Ollama_openwebui_20251030.log (14.7 KB)

- Ollama natively and command line query (no Docker running): success

Ollama_cmd_lines_20251030.log (15.8 KB)

1 Like

Before updating, I was able to run up to 8b models on Ollama and Open-Webui. Now I can only run up to 3b models without getting memory errors. I’m gonna wait until a simple solution comes out, kinda like I did when a couple docker packages stopped ollama from working at all.

Is there a simple and intuitive way to revert back to version r36.4.4?

Thanks for all the help!

By far have enough memory.

I can run 6 different containers, running things like n8n, and vision models WHILE running a 8b model on ollama no problem on my r36.4.4 jetpack

I cant even run a 8b ALONE constantly on my r36.4.7 jetpack

2 Likes

Hi, all

Thanks a lot for the testing and feedback.

This memory issue happens on r36.4.7 itself.
It doesn’t matter what kind of flashing/upgrading process and SDCard/NVMe device.
The large memory allocation might fail (with a rate) once upgrading to r36.4.7.

We are actively discussing this with our internal team.
Will keep you all updated with the latest progress.

Thanks.

2 Likes

Hi all,

I’m trying also to deploy a audio-text-to-text model in my Jetson Orin Nano 8GB and, even if I use models below 1B (like 0.7B or even 50M) using Python Transformers, I cannot run them.

In this case, after the download, when he is testing to see if everything is ok, he blocks after some seconds and then reboots. Do you think this could also be related with the same issue, or is it really a limitation of the Nano?

The models I tried to run were the following:

Thanks for the help.

Regards, Tiago

Reverted to the earlier version for the moment.


> Blockjay@jetson-ai:~/Desktop$ dpkg -s nvidia-l4t-firmware
Package: nvidia-l4t-firmware
Status: install ok installed
Priority: standard
Section: kernel
Installed-Size: 18989
Maintainer: NVIDIA Corporation
Architecture: arm64
Version: 36.4.3-20250107174145
Replaces: linux-firmware
Depends: libc6, libgcc-s1, libstdc++6
Pre-Depends: nvidia-l4t-core (>> 36.4-0), nvidia-l4t-core (<< 36.5-0)
Conffiles:
 /etc/systemd/nvwifibt-pre.sh bb212167f4cc1d5053e7a28032712066
 /etc/systemd/nvwifibt.sh ffe439710a571cc8028ca033e68ef39d
 /etc/systemd/system/nvwifibt.service be3bb07122fdab90c0fe8a34ac85d3ef
Description: NVIDIA Firmware Package
Homepage: http://developer.nvidia.com/jetson
jay@jetson-ai:~/Desktop$ sudo apt-mark hold nvidia-l4t-firmware
[sudo] password for jay: 
nvidia-l4t-firmware set on hold.
jay@jetson-ai:~/Desktop$

How did you downgrade/revert your firmware? I haven’t found instructions on how to do that. I rebuilt my nano with 36.4.4 and the dpkg reports that I have firmware version 36.4.4 - but when I boot the nano it clearly shows that it still has “Jetson System Firmware version 36.4.7-gcid-42132812 dated 2025-09-18”.

Thanks!

Hi, @tm.alves.rodrigues

The memory issue is reported on the r36.4.7 branch.
Is your environment r36.4.7?

$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

Thanks.

Hi @AastaLLL,

My environment is the r36.4.3 and my Jetson Firmware is 36.4.7.

But something is also not good with my device because, since the first time I turn it on and use it, I’m facing the same memory allocation issues reported here (as you can see in the comments above at "unable to allocate CUDA0 buffer" after Updating Ubuntu Packages - #49 by tm.alves.rodrigues )

I believe that the issue is not only related with tegra release > 36.4.7.
Let me know if you need more information.

Regards, T

@AastaLLL FYI - I’ve reflashed my Jetson to 36.4.4, and everything is working as expected, i.e., I can even load 8B-parameter models.

This is my system which used to work well for over a year and has had this issue since October after upgrade:

cat /etc/nv_tegra_release

R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025

KERNEL_VARIANT: oot

TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

I have even reflashed my system with the SDK Manger and started over from scratch and I end up getting the same errors as every one else.

Boy do I wish I’d checked the forums before doing the apt-get upgrade this morning.

Rebooted and sure enough ollama stopped working. Same issue as everyone above is reporting.

root@cluster-node-ai:/# ollama run llama3.1:8b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
root@cluster-node-ai:/# ollama run mistral:instruct
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
root@cluster-node-ai:/# ollama run phi3
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

All of these models ran without issue before the upgrade.

root@cluster-node-ai:/# cat /etc/nv_tegra_release 
# R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

@jsc2718 How did you downgrade your packages to the previous version?

Hi NVIDIA,

many users are still facing memory errors with using the Jetson Orin Nano. The issue has been reported for several weeks, but there’s still no official fix. We really need a stable and working version to continue our development and testing. Could you please share an update or estimated timeline for a solution?

Thanks.

5 Likes

@SirMuttley, @bradford.elliott In Feb 2025, I used microSD in the initial install, but quickly moved to an nvme, using dd and gparted to move to the next device. Since that time, I have been working with my nvme, leaving the SD configuration in place.

1 Like

@JSC2718, thanks for the offer but I don’t think I need it now. Despite the firmware still showing as 36.4.7, Ollama seems to be working better now. It still seems a bit slow, and jtop reports ‘Jetpack not detected’, but it’s mostly working I think.

@bradford.elliott are you running Ollama natively or in docker?

The ‘Jetpack not detected’ should be fine if you’re running in docker as all the packages you need should be installed in the container. My host install is the minimal OS and then I just do everything with jetson-containers.

If Ollama is slow you might want to check it’s running on the GPU rather than CPU.

Hi, @all

Thank you all for the testing and sharing.
We are really sorry about the inconvenience that the r36.4.7 brings.

Although our internal team is still working on the issue, here are some updates about the issue that we can share with you:
The recent update (r38.2.1->r38.2.2, r36.4.4->r36.4.7, 35.6.2->r35.6.3) contains a security fix for CVE-2025-33182 & CVE-2025-33177:

The patches can be found in the below comment (r35.6.3 version):

The security fix adds a mechanism to prevent the allocation from going into the OOM path (to prevent a denial of service attack).
This led to some limitations in the allocable memory.

We are discussing how to minimize the impact of this security fix.
Will keep you all updated on the latest status.

Thanks.

8 Likes

So you basically caused a denial of service to prevent another one. Let’s hope there is a fix out soon, that’s a major bug!

5 Likes

Just wanted to quickly chime in an add that I am seeing this too ;) If there isn’t a fix coming soon could someone suggest a version to flash that would work until Nvidia comes out with a fix?

No Workaround just a hint:

my Jetson Firmware is 36.4.7. I also get the failure CUDA0 buffer.
But after some playing around I can use Ollama run llama3.2:3b succesfully.

What I did

  1. RAM Optimization –> 🔖 Memory optimization - NVIDIA Jetson AI Lab
  2. From these instructions, I obviously executed this command as last.
    sudo init 3 and wait for the Terminal and login
  3. I start a second terminal with teh buttons ctrl + alt + 2
  4. I run in the second terminal jtop
  5. I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 900mb
  6. I go back to Terminal 1 with ctrl + alt +1
  7. I run: jetson-containers run --name ollama $(autotag ollama)
  8. I run: ollama run gemma3 → fails
  9. I go back to Terminal 2 with ctrl + alt +2
  10. I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 500mb
  11. run point 5 to 10 sevral time in a loop till Cache drops down 300 mb
  12. now runs ollama run gemma3 and ollama run llama3.2:3b
  13. sometimes llama3.2:3b or gemma3 crashed then i do follwing
  14. I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 185mb ← yes unbelievable but true
  15. I go back to Terminal 1 with ctrl + alt +1
  16. run llama run gemma3or llama3.2:3b both are running
  17. If my memory usage is under 300mb both llama run gemma3or llama3.2:3b runs succesfully.

Even if i start the Desktop again with sudo init 5 start chromium and jtop cache show 2.1 GB in use. if I clear the cache it will go down to 750mb.

Then i quit the desktop again with

  1. sudo init 3
  2. ctrl + alt + 2
  3. I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly to 180mb
  4. now runs ollama run gemma3 and ollama run llama3.2:3bboth again.

I hope this helps!

By aggressively clearing cache and maintaining ≤300MB usage, Ollama models run successfully on Jetson 36.4.7. This suggests the “CUDA0 buffer” error is memory-related.