Cuda0 Buffer Error

My Jetson Orin Nano Dev Kit Super (8 GB) has become unstable since updating to JetPack 6.2 (L4T 36.4.x). It was previously able to run large 8B models without issues on older JetPack, but now even small 1B or 3B models fail to load. The system also freezes and sometimes reboots when this happens.

The main error is:

error loading model: unable to allocate CUDA0 buffer

Sometimes kernel logs also show:

nvidia-modeset: ERROR: GPU:0: Failed to allocate 2743000 KBPS Iso and 4294967295 KBPS Dram

nvidia-modeset: ERROR: GPU:0: Unexpectedly failed to lock to max DRAM pre-modeset!

NVRM nvAssertOkFailedNoLog: Assertion failed … @ kern_disp.c:1161

I’m running this in Docker with NVIDIA runtime using the dustynv/ollama:0.6.8-r36.4-cu126-22.04 container. The problem occurs even when running very small LLMs like llama3.2:1b.

Command used:

sudo docker run -d –name ollama –runtime nvidia -e OLLAMA_MAX_LOADED_MODELS=1 -e OLLAMA_NUM_PARALLEL=1 -e OLLAMA_CONTEXT_LENGTH=1024 -p 11434:11434 -v ollama:/data dustynv/ollama:0.6.8-r36.4-cu126-22.04 ollama serve

echo β€œhi” | sudo docker exec -i ollama ollama run llama3.2:1b

I’ve tried MAXN_SUPER and 15W modes, jetson_clocks, stopping gdm3 and nvargus-daemon, adding swap, and even clean reinstalling JetPack and L4T packages. The behavior doesn’t change.

It looks like CUDA memory allocation or GPU carveout is broken in 36.4.x, possibly related to the new display driver.

Is this a known regression? Which JetPack version is currently stable for CUDA inference on the Orin Nano Super?

Hi,

There is no known regression on the CUDA memory.
Which version did you use previously? Could you also list the JetPack and container for the stable environment with us, so we can check it further?

Thanks.

got the same issue.

What happened to me:

  1. Ollama with llama3.2:3b worked fine on r36.4.3

  2. Ran Ubuntu software updates β†’ upgraded to r36.4.7 (auto updated after booting up)

  3. After update: Ollama broke with β€œunable to allocate CUDA0 buffer”

something about the update breaks the malloc for GPU i think.

1 Like

Hi,

Thanks a lot for reporting the CUDA issue.

We also got a very similar report on the topic 347862:

However, we cannot reproduce the CUDA error with the steps shared in that topic (upgrade to r36.4.7 from r36.4.4).
Could you also share the steps that reproduce the issue on your side?
We would like to give it a try as well.

Thanks.

Same message: β€œunable to allocate CUDA0 buffer”. There are already so many entries about this issue β€” why is there still no solution?

Hi,

We are checking this issue internally.
Will keep you updated on the latest status.

Thanks.

I can confirm that updating Ubuntu via update/upgrade breaks (in my case) the stable diffusion webui Jetson container. Also the (directly installed) Ollama prog no longer works.

My temporarily work-around: fresh install JetPack 6.2.1. and the Jetson container stable diffusion webui and Ollama directly (bash installl). NOT UPDATING UBUNTU keeps both programs functioning.

It’s annoying but it works.

1 Like

The same thing happened to me I can’t even load llama3.2:1b let alone try to run bigger models

Hi, all

We are checking on this issue.
Sorry for the inconvenience, and hope to share more information with you soon.

Thanks.

1 Like

I also have this issue with a Jetson Orin Nano Super after updating the system:

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

I have the same issue:

k33g@k33g-jetson:~$ ollama run qwen2.5:1.5b
pulling manifest 
pulling 183715c43589: 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 986 MB                         
pulling 66b9ea09bd5b: 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   68 B                         
pulling eb4402837c78: 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 1.5 KB                         
pulling 832dd9e00a68: 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  11 KB                         
pulling 377ac4d7aeef: 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  487 B                         
verifying sha256 digest 
writing manifest 
success 
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

So the Jetson isn’t usable.

I’m using:

Ubuntu 22.04.5 LTS on Jetson Orin Nano 8GB RAM + JetPack 6

I installed Ollama with this command:

curl -fsSL https://ollama.com/install.sh | sh

Hi, both

The β€œunable to allocate CUDA0 buffer” is a known issue related to the r36.4.7 update.
Please find more information on the topic below:

Thanks.