"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages

JSC2718 · October 16, 2025, 3:50am

After Ubuntu update it said, “Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model”

I typically would run ollama from jetson-containers, and up until the recent update it worked fine.

I tried to keep separation with the host not having the base system including packages that were used for models, but this effort seemed not to help.

Since it was mentioned already that rebuilding the jetson from scratch nor rebuilding the containers from source was fruitful. I wonder if rolling back the updates would be of any use. Forbidding any updates perhaps was the best approach, and I cringed at the thought of updating anything every time I see an “Update is ready” or similar message!

Installing and running ollama natively does not help.

Share Post #18

I did also see that the firmware was updated.

Start-Date: 2025-10-09 18:07:03
Commandline: packagekit role=‘update-packages’
Upgrade: nvidia-l4t-weston:arm64 (36.4.4-20250616085344
36.4.7-20250918154033)
udev:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
systemd-oomd:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-vulkan-sc-samples:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-vulkan-sc-sdk:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-firmware:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
wpasupplicant:arm64 (2:2.10-6ubuntu2.2, 2:2.10-6ubuntu2.3)
nvidia-l4t-kernel-oot-headers:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
systemd-timesyncd:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-oem-config:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-jetson-multimedia-api:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-libwayland-egl1:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
libpam-systemd:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
containerd:arm64 (1.7.27-0ubuntu1~22.04.1, 1.7.28-0ubuntu1~22.04.1)
nvidia-l4t-wayland:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-kernel-oot-modules:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
nvidia-l4t-kernel:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
nvidia-l4t-graphics-demos:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
libsystemd0:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-3d-core:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-nvpmodel:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
libnss-systemd:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-cuda-utils:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
libwbclient0:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.9)
libudev-dev:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
libsmbclient:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.9)
nvidia-l4t-core:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-kernel-dtbs:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
systemd:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
libudev1:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-optee:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-cuda:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-dla-compiler:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-nvml:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-openwfd:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-nvfancontrol:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
docker.io:arm64 (27.5.1-0ubuntu3~22.04.2, 28.2.2-0ubuntu1~22.04.1)
nvidia-l4t-libwayland-client0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-nvsci:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-init:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-gbm:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-jetsonpower-gui-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-vulkan-sc:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-display-kernel:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
nvidia-l4t-configs:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-pva:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-multimedia:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-multimedia-utils:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-x11:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
docker-ce-rootless-extras:arm64 (5:28.5.0-1~ubuntu.22.04~jammy, 5:28.5.1-1~ubuntu.22.04~jammy)
nvidia-l4t-apt-source:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
runc:arm64 (1.2.5-0ubuntu1~22.04.1, 1.3.0-0ubuntu2~22.04.1)
nvidia-l4t-kernel-headers:arm64 (5.15.148-tegra-36.4.4-20250616085344, 5.15.148-tegra-36.4.7-20250918154033)
nvidia-l4t-bootloader:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
snapd:arm64 (2.68.5+ubuntu22.04.1, 2.71+ubuntu22.04)
nvidia-l4t-gstreamer:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-camera:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-libwayland-cursor0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-nvpmodel-gui-tools:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
samba-libs:arm64 (2:4.15.13+dfsg-0ubuntu1.8, 2:4.15.13+dfsg-0ubuntu1.9)
nvidia-l4t-initrd:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
systemd-sysv:arm64 (249.11-0ubuntu3.16, 249.11-0ubuntu3.17)
nvidia-l4t-libwayland-server0:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-xusb-firmware:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-jetson-io:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
nvidia-l4t-vulkan-sc-dev:arm64 (36.4.4-20250616085344, 36.4.7-20250918154033)
End-Date: 2025-10-09 18:08:15

AastaLLL · October 16, 2025, 6:38am

Hi,

Could you try to run a simple CUDA sample (ex, deviceQuery) to check the GPU functionality first?

Thanks.

JSC2718 · October 16, 2025, 2:24pm

I git clone the repo, but the cuda compiler is not installed on the host. I have been building inside the docker, so would try this method.

inforsunpteranadon · October 16, 2025, 10:42pm

ollama

total used free shared buff/cache available

Mem: 7802584 2649992 234220 117504 4918372 4796784
Swap: 3901272 81920 3819352

Hi, may I ask if you have found a workaround? I had the same error pop up despite this model working for me when I had initially downloaded it. I’m not sure if I have the same error because I’m pretty new to this, but I think I have sufficient memory. I run this in docker and have also tried smaller models like the llama3.2:1b and while they work initially I get the same error message the next day. I had only flashed my device about a few days ago as well.

JSC2718 · October 16, 2025, 11:51pm

No, I do not have any solution at the moment, sorry.

I am planning on trying to run the sample CUDA to check the GPU functionality, but I have not had time to try that yet.

antreask · October 17, 2025, 10:27am

What version of Ollama are you using? I had the same issue. After I updated Ollama to 0.12.6, the issue seems to have been resolved.

jkassie · October 19, 2025, 4:31pm

Not the OP, but experiencing the same issue. I downloaded and ran that and this is what I get.

jacquezte · October 20, 2025, 4:06am

same problem, worked the day before then some kind of update occured upon booting up and now ollama run llama3.2:3b gives

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

i also updated my ollama version to 0.12.6 and still gives the same error

AastaLLL · October 20, 2025, 6:53am

Hi,

CUDA compile is located at /usr/local/cuda-12.6/bin.
Please try to set up the environment parameter with the comments below and try it again.

$ export PATH=/usr/local/cuda-12.6/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH

Just want to confirm that the ‘upgrade’ you mean here is from default r36.4.4 to r36.4.7?

Thanks.

jacquezte · October 20, 2025, 7:25am

Thanks for replying!

What happened:

Before Oct 18: Ollama with llama3.2:3b worked fine on r36.4.3
Oct 18: You ran Ubuntu software updates → upgraded to r36.4.7
After update: Ollama broke with “unable to allocate CUDA0 buffer”

jacques@jacques-desktop:~$ export PATH=/usr/local/cuda-12.6/bin:$PATH
jacques@jacques-desktop:~$ export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH
jacques@jacques-desktop:~$ ollama
Usage:
ollama [flags]
ollama [command]

Available Commands:
serve Start ollama
create Create a model
show Show information for a model
run Run a model
stop Stop a running model
pull Pull a model from a registry
push Push a model to a registry
signin Sign in to ollama.com
signout Sign out from ollama.com
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command

Flags:
-h, --help help for ollama
-v, --version Show version information

Use “ollama [command] --help” for more information about a command.
jacques@jacques-desktop:~$ ollama run llama3.2:3b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
jacques@jacques-desktop:~$

JSC2718 · October 20, 2025, 6:51pm

I wonder if rollback the nvidia-l4t-firmware:arm64 package to version 36.4.4-20250616085344 from 36.4.7-20250918154033 would help?

I have not tried, but seems there are some folks that have explored this option.

Of course one runs the risk of an finding the jetson in unrecoverable state.

JSC2718 · October 20, 2025, 8:44pm

jay@jetson-ai:~/Downloads/cuda-samples/cuda-samples-12.5/Samples/1_Utilities/deviceQuery

$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Orin”
CUDA Driver Version / Runtime Version          12.6 / 12.6
CUDA Capability Major/Minor version number:    8.7
Total amount of global memory:                 7620 MBytes (7990005760 bytes)
(008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
GPU Max Clock rate:                            1020 MHz (1.02 GHz)
Memory Clock rate:                             1020 Mhz
Memory Bus Width:                              128-bit
L2 Cache Size:                                 2097152 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total shared memory per multiprocessor:        167936 bytes
Total number of registers available per block: 65536
Warp size:                                     32
Maximum number of threads per multiprocessor:  1536
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
Run time limit on kernels:                     No
Integrated GPU sharing Host Memory:            Yes
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
Device supports Unified Addressing (UVA):      Yes
Device supports Managed Memory:                Yes
Device supports Compute Preemption:            Yes
Supports Cooperative Kernel Launch:            Yes
Supports MultiDevice Co-op Kernel Launch:      Yes
Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.6, CUDA Runtime Version = 12.6, NumDevs = 1
Result = PASS
jay@jetson-ai:~/Downloads/cuda-samples/cuda-samples-12.5/Samples/1_Utilities/deviceQuery$

jacquezte · October 20, 2025, 11:28pm

I’m pretty much just going to reflash my sd card to get back to L4T 36.4.3. Please to get back to me if a fix is found. Thank you everyone.

AastaLLL · October 21, 2025, 4:55am

Hi, both

Thanks for the update.

It looks like the GPU is functional, but some issues when allocating a buffer with ollama.
We will try this internally and provide more info later.

Thanks

AastaLLL · October 21, 2025, 6:36am

Hi,

We test this in our environment but fail to reproduce.

Original environment is r36.4.4

$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 4.4, GCID: 41062509, BOARD: generic, EABI: aarch64, DATE: Mon Jun 16 16:07:13 UTC 2025
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

$ docker run --runtime nvidia -it --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.4.0
...
root@tegra-ubuntu:/# ollama run llama3.2:3b
pulling manifest 
...          
verifying sha256 digest 
writing manifest 
success 
>>>

Upgrade to r36.4.7

$ sudo apt update
$ sudo apt dist-upgrade

Test

$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 4.7, GCID: 42132812, BOARD: generic, EABI: aarch64, DATE: Thu Sep 18 22:54:44 UTC 2025
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

$ docker run --runtime nvidia -it --rm --network=host -v ~/ollama:/ollama -e 
...
root@tegra-ubuntu:/# ollama run llama3.2:3b
>>> Send a message (/? for help)

Not sure if this issue only happens when upgrading from r36.4.3.
We will discuss this internally and share more information later.

Thanks.

pash · October 21, 2025, 3:22pm

I’m having this issue too, exact same symptoms, tried a fresh container just like was done above and still getting the CUDA0 buffer error.

I have found I can load some very small models 0.5b-1b variants but they also randomly fail now too.

root@deckard:/# ollama pull llama3.2:3b
pulling manifest
…
verifying sha256 digest
writing manifest
success
root@deckard:/# ollama run llama3.2:3b
Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
root@deckard:/#

pash · October 21, 2025, 4:58pm

some experimenting later, it works fine in CPU mode, obviously this is much slower but operates fine even with 7b models that are much larger, so it proves its not really a memory space issue, it seems like its some kind of memory fragmentation issue.

antreask · October 21, 2025, 6:05pm

Same issue here. I can load all the models in CPU mode without any issues, but in GPU mode, only the smaller ones load, and even then, sometimes they fail to load as well.

JSC2718 · October 21, 2025, 6:37pm

Says “ollama version is 0.9.5”, how can I update in the jetson container?

I tried to download a new version from docker and then run it in the jetson container.

I also tried “curl -fsSL https://ollama.com/install.sh | sh” route, but same result.

jay@jetson-ai:~$ jetson-containers run  $(autotag ollama) bash                                                                                                         
Namespace(packages=['ollama'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.4.7  JETPACK_VERSION=5.1  CUDA_VERSION=12.6
-- Finding compatible container image for ['ollama']
dustynv/ollama:r36.4-cu129-24.04
V4L2_DEVICES: 
### DISPLAY environmental variable is already set: ":0"
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/jay/Desktop/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20251021_093332 dustynv/ollama:r36.4-cu129-24.04 bash
root@jetson-ai:/# which ollama
/usr/local/bin/ollama
root@jetson-ai:/# ollama --version
ollama version is 0.9.5
root@jetson-ai:/#

JSC2718 · October 21, 2025, 6:45pm

I tried to run those commands, but same result as before.

Also here is the jtop memory status image.

Thanks

jay@jetson-ai:~$ docker run --runtime nvidia -it --rm --network=host -v ~/.ollama dustynv/ollama:r36.4-cu129-24.04 bash
root@jetson-ai:/# chmod a+x start_ollama 
root@jetson-ai:/# ./start_ollama 

Starting ollama server


OLLAMA_HOST   0.0.0.0
OLLAMA_LOGS   /data/logs/ollama.log
OLLAMA_MODELS /data/models/ollama/models


ollama server is now started, and you can run commands here like 'ollama run gemma3'

root@jetson-ai:/# ollama run llama3.2:3b
pulling manifest 
pulling manifest                                                                                           pulling manifest            pulling pulling manifest      pullpulling mapulling pulling manifest 
pulling dde5aa3fc5ff: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB                         
pulling 966de95ca8a6: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.4 KB                         
pulling fcc5a6bec9da: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB                         
pulling a70ff7e570d9: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 KB                         
pulling 56bb8bd477a5: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 34bb5ab01051: 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  561 B                         
verifying sha256 digest 
writing manifest 
success 
Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
root@jetson-ai:/#

Topic		Replies	Views
Updating Orin Nano breaks Ollama Jetson Orin Nano cuda , generative_ai	26	961	December 11, 2025
Ollama errors orin nano Jetson Orin NX generative_ai	30	1288	December 23, 2025
Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2) Jetson Orin Nano generative_ai	40	797	December 12, 2025
Cuda0 Buffer Error Jetson Orin Nano cuda	12	824	November 5, 2025
How to control amount of shared memory available to LLM on Jetson Thor? Jetson Thor generative_ai	21	616	November 10, 2025
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	13235	August 28, 2024
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	6054	March 20, 2024
Run llm stuck while use jetson thor Jetson Thor cuda , generative_ai	7	348	September 25, 2025
@Dusty_nv has anyone managed to get Ollama running with llama3.2-vision yet? Jetson AGX Orin cuda , generative_ai , llama	7	640	December 28, 2024
Failed Llama.cpp inference on AGX Xavier: Need to downgrade L4T from 35.6.3 to 35.6.2 Jetson AGX Xavier llama	4	90	November 18, 2025

"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages

Original environment is r36.4.4

Upgrade to r36.4.7

Test

Related topics