I am having the same problem ollama llama3.2.1b - and most models.
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
I can only get one model to work:
gemma3:4b
I followed suggested Models ( only gemma3:4b worked ) :
x - ollama run llama3.2:3b
x - ollama run llama3.2:1b
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
y - ollama run gemma3:4b
x - ollama run starcoder2:3b
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
x - ollama run falcon3:3b
x - ollama run phi4-mini-reasoning:latest
=> Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
=> Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
I just did a clean reinstall from the ISO SD image.
Two different errors!!!
ollama run gemma:2b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model
ollama run gemma:2b
Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory
I did not use docker for this, just the native and updated environment
This issue is related to the r36.4.7 update and has a failure rate.
You might see either ‘cudaMalloc failed: out of memory’ or ’ unable to allocate CUDA0 buffer’.
The underlying error for both cases are (in Ollama log):
NvMapMemAllocInternalTagged: 1075072515 error 12
Our internal team is working on the issue.
Will update more information with you later.