My Jetson Orin Nano Dev Kit Super (8 GB) has become unstable since updating to JetPack 6.2 (L4T 36.4.x). It was previously able to run large 8B models without issues on older JetPack, but now even small 1B or 3B models fail to load. The system also freezes and sometimes reboots when this happens.
The main error is:
error loading model: unable to allocate CUDA0 buffer
Sometimes kernel logs also show:
nvidia-modeset: ERROR: GPU:0: Failed to allocate 2743000 KBPS Iso and 4294967295 KBPS Dram
nvidia-modeset: ERROR: GPU:0: Unexpectedly failed to lock to max DRAM pre-modeset!
Iβm running this in Docker with NVIDIA runtime using the dustynv/ollama:0.6.8-r36.4-cu126-22.04 container. The problem occurs even when running very small LLMs like llama3.2:1b.
Iβve tried MAXN_SUPER and 15W modes, jetson_clocks, stopping gdm3 and nvargus-daemon, adding swap, and even clean reinstalling JetPack and L4T packages. The behavior doesnβt change.
It looks like CUDA memory allocation or GPU carveout is broken in 36.4.x, possibly related to the new display driver.
Is this a known regression? Which JetPack version is currently stable for CUDA inference on the Orin Nano Super?
There is no known regression on the CUDA memory.
Which version did you use previously? Could you also list the JetPack and container for the stable environment with us, so we can check it further?
We also got a very similar report on the topic 347862:
However, we cannot reproduce the CUDA error with the steps shared in that topic (upgrade to r36.4.7 from r36.4.4).
Could you also share the steps that reproduce the issue on your side?
We would like to give it a try as well.
I can confirm that updating Ubuntu via update/upgrade breaks (in my case) the stable diffusion webui Jetson container. Also the (directly installed) Ollama prog no longer works.
My temporarily work-around: fresh install JetPack 6.2.1. and the Jetson container stable diffusion webui and Ollama directly (bash installl). NOT UPDATING UBUNTU keeps both programs functioning.
I also have this issue with a Jetson Orin Nano Super after updating the system:
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer llama_model_load_from_file_impl: failed to load model