I confirm that in version r36.4.4, I was able to run both containers (Ollama and Open-Webui) on the machine (Jetson Orin Nano Super Developer Kit). Like @jetson15, I even ran several containers at the same time (n8n, ollama, open-webui, whisper) without any problems.
Before updating, I was able to run up to 8b models on Ollama and Open-Webui. Now I can only run up to 3b models without getting memory errors. I’m gonna wait until a simple solution comes out, kinda like I did when a couple docker packages stopped ollama from working at all.
Is there a simple and intuitive way to revert back to version r36.4.4?
This memory issue happens on r36.4.7 itself.
It doesn’t matter what kind of flashing/upgrading process and SDCard/NVMe device.
The large memory allocation might fail (with a rate) once upgrading to r36.4.7.
We are actively discussing this with our internal team.
Will keep you all updated with the latest progress.
I’m trying also to deploy a audio-text-to-text model in my Jetson Orin Nano 8GB and, even if I use models below 1B (like 0.7B or even 50M) using Python Transformers, I cannot run them.
In this case, after the download, when he is testing to see if everything is ok, he blocks after some seconds and then reboots. Do you think this could also be related with the same issue, or is it really a limitation of the Nano?
How did you downgrade/revert your firmware? I haven’t found instructions on how to do that. I rebuilt my nano with 36.4.4 and the dpkg reports that I have firmware version 36.4.4 - but when I boot the nano it clearly shows that it still has “Jetson System Firmware version 36.4.7-gcid-42132812 dated 2025-09-18”.
Boy do I wish I’d checked the forums before doing the apt-get upgrade this morning.
Rebooted and sure enough ollama stopped working. Same issue as everyone above is reporting.
root@cluster-node-ai:/# ollama run llama3.1:8b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
root@cluster-node-ai:/# ollama run mistral:instruct
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
root@cluster-node-ai:/# ollama run phi3
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
All of these models ran without issue before the upgrade.
many users are still facing memory errors with using the Jetson Orin Nano. The issue has been reported for several weeks, but there’s still no official fix. We really need a stable and working version to continue our development and testing. Could you please share an update or estimated timeline for a solution?
@SirMuttley, @bradford.elliott In Feb 2025, I used microSD in the initial install, but quickly moved to an nvme, using dd and gparted to move to the next device. Since that time, I have been working with my nvme, leaving the SD configuration in place.
@JSC2718, thanks for the offer but I don’t think I need it now. Despite the firmware still showing as 36.4.7, Ollama seems to be working better now. It still seems a bit slow, and jtop reports ‘Jetpack not detected’, but it’s mostly working I think.
The ‘Jetpack not detected’ should be fine if you’re running in docker as all the packages you need should be installed in the container. My host install is the minimal OS and then I just do everything with jetson-containers.
If Ollama is slow you might want to check it’s running on the GPU rather than CPU.
Thank you all for the testing and sharing.
We are really sorry about the inconvenience that the r36.4.7 brings.
Although our internal team is still working on the issue, here are some updates about the issue that we can share with you:
The recent update (r38.2.1->r38.2.2, r36.4.4->r36.4.7, 35.6.2->r35.6.3) contains a security fix for CVE-2025-33182 & CVE-2025-33177:
The patches can be found in the below comment (r35.6.3 version):
The security fix adds a mechanism to prevent the allocation from going into the OOM path (to prevent a denial of service attack).
This led to some limitations in the allocable memory.
We are discussing how to minimize the impact of this security fix.
Will keep you all updated on the latest status.
Just wanted to quickly chime in an add that I am seeing this too ;) If there isn’t a fix coming soon could someone suggest a version to flash that would work until Nvidia comes out with a fix?
From these instructions, I obviously executed this command as last. sudo init 3 and wait for the Terminal and login
I start a second terminal with teh buttons ctrl + alt + 2
I run in the second terminal jtop
I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 900mb
I go back to Terminal 1 with ctrl + alt +1
I run: jetson-containers run --name ollama $(autotag ollama)
I run: ollama run gemma3 → fails
I go back to Terminal 2 with ctrl + alt +2
I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 500mb
run point 5 to 10 sevral time in a loop till Cache drops down 300 mb
now runs ollama run gemma3 and ollama run llama3.2:3b
sometimes llama3.2:3b or gemma3 crashed then i do follwing
I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly by 185mb ← yes unbelievable but true
I go back to Terminal 1 with ctrl + alt +1
run llama run gemma3or llama3.2:3b both are running
If my memory usage is under 300mb both llama run gemma3or llama3.2:3b runs succesfully.
Even if i start the Desktop again with sudo init 5 start chromium and jtop cache show 2.1 GB in use. if I clear the cache it will go down to 750mb.
Then i quit the desktop again with
sudo init 3
ctrl + alt + 2
I go to screen MEM with a press on 4 and press c to clear the chache —> cache drops nearly to 180mb
now runs ollama run gemma3 and ollama run llama3.2:3bboth again.
I hope this helps!
By aggressively clearing cache and maintaining ≤300MB usage, Ollama models run successfully on Jetson 36.4.7. This suggests the “CUDA0 buffer” error is memory-related.