I am having the same problem ollama llama3.2.1b - and most models.
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
I can only get one model to work:
gemma3:4b
I followed suggested Models ( only gemma3:4b worked ) :
x - ollama run llama3.2:3b
x - ollama run llama3.2:1b
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
y - ollama run gemma3:4b
x - ollama run starcoder2:3b
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
x - ollama run falcon3:3b
x - ollama run phi4-mini-reasoning:latest
=> Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
I just did a clean reinstall from the ISO SD image.
Two different errors!!!
ollama run gemma:2b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
ollama run gemma:2b
Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
I did not use docker for this, just the native and updated environment
This issue is related to the r36.4.7 update and has a failure rate.
You might see either ‘cudaMalloc failed: out of memory’ or ’ unable to allocate CUDA0 buffer’.
The underlying error for both cases are (in Ollama log):
NvMapMemAllocInternalTagged: 1075072515 error 12
Our internal team is working on the issue.
Will update more information with you later.
These units are still being sold and are unusable for their intended purpose for MONTHS now.
If it is really this difficult to fix, then at least release a WORKING IMAGE with utilities installed and auto-updates that break them DISABLED!!!
The ISO-image on your server is a good option, but it needs some updates to be able to run and install Ollama and by now has other updates in the ever increasing list that breaks proper functioning!
Can Nvidia provide a workaround or reaccommodation on what version of jetpack to roll back to? Also is Cuda 13.1 approved for the Nano now and should we move forward to that version to solve the problem?
Surely, it is beneath a big player like NVIDIA to only have a broken OS image that has to be updated before a core functionality of the sold product can be used.
I got the patch installed eventually and it is an improvement. However, with bigger LLMs *that DID WORK previously) I still get an error message, even after a fresh boot with no other programs running:
ollama run gemma:7b
Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory