Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2)

Which version of ollama did you use? I updated to the newest then the problem disappeared for some of the models.

For testing purposes I have left things as stock as possible.

ollama version is 0.12.3 running only in Docker version 28.4.0, build d8eb465 and not installed natively

Check this bro. I reverted to 0.10.0 and the issue is solved

I will check this out and will modify the method in the tutorial https://www.youtube.com/watch?v=R0PjKr4d-gU with the command:

docker run -d --runtime nvidia -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama:0.10.0

The full instruction from a stock install of the official orin nano OS (jetpack/ubuntu 22.04) would be:

sudo usermod -aG docker $USER

newgrp docker

docker run -d --runtime nvidia -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama:0.10.0

docker exec -it ollama ollama run llama3.2:3b

I am testing with one of the models now and it seems promising. the llm does not seem lobotomised anymore, is not forgetting things it should know and so far no weird outputs. I will test other models later.

I still hope the original problem is solved as having to use an older ollama version means some models might not work…

update:

the qwen2.5:7b llm has improved, but is still exhibiting weird behaviour.

It now outputs:

```plaintext

and goes into a loop.

the loop is interruptable and the scenario can proceed then, but it soon happens again

sigh, I will have to restore my backup again as after a docker update, the ollama running in docker no longer sees the gpu.

I’ve found out that “–runtime nvidia was deprecated after Docker 20.10; --gpus is the supported way to expose GPUs.“ Which is weird as the original tutorial made a point of explicitly changing the –gpus to the –runtime!

So to have this specialised SBC do what it is supposed to do, we now apparently have to use both an older ollama and an older docker! That is NOT a solution…

I will have to restore my backup and forbid any Nvidia/CUDA updates as they make ollama in a docker unusable:

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

I even tried installing ollama natively again, but that ALSO gives an error. (The same cuda error as well as an error that might be related to the ollama snap not being to find stuff)

Nvidia please provide a fix as this makes the board unusable or unupdatable….

UPDATE: 1

Apparently after rebooting again, some more steps of the updates were implemented (even though there was no message about a reboot being required). After those the native install seems to work. I will try the docker install too after the update.

Update 2:

I tried the docker install. That lead to the error message: Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer

But the weird thing is, I tried the docker exec command again for some reason and now it WORKED!

However, when e4xiting one llm model and running another, the CUDA0 error returns. The command ‘Docker restart ollama’ works sometimes, but I had to log out and log in again at other times and even then got the error “Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory“ which only disappeared after running the command ‘Docker restart ollama’ again.

Another error sometimes crops up alongside the CUDA0 error: “llama_model_load_from_file_impl: failed to load model“

Weird things are happening with this update!

I have the same issue that popped up after an update. I have re-flashed using the SDK-Manager twice and even reverted to Jetpack 6.1.1 with the same results using Ollama native. It was working well with up to 4gb models but now won’t even run the smallest 1.5 gb model. I wonder what broke?

1 Like

outside docker only tinyllama works and gemma:2b seems to big and gives errors

inside docker things deteriorated: docker exec -it ollama ollama run gemma2:2b
Error response from daemon: container X is not running

It seems the only way to have this working is to return to the stock OS image, disable updates and use old versions and even then there is some weird behaviour in some llms…. This is NOT what I imagined would be using this board when I bought it!

I tried doing just that by going back but I was using the SDKManager to reflash to 6.1.1 which originally worked well and I could run models up to 4b in size. It appears that something in the Ubuntu kernel might be causing how the ngpu is addressed. I will try the original sd card with the NVME drive and see what happens. I agree that it is a shame that the jetpack updates wrecked everything!

I’m using qwen2.5:7b-instruct and haven’t experienced such problem through lots of tests

If 0.10.0 works what would be the main concern to still run ollama in a docker?

that the update makes it work extremely unreliably

are you using a swap file? if so, how did you set it up?

No, just install it natively without docker or anything, while my device is Orin 64G but not nano

ok, so it might be limited to the orin nano. perhaps that can help devs to solve this

Bro do you have any update? Just as you said, I happened to need to run a new model and 0.10.0 doesn’t support it

in the “Ollama errors orin nano“ thread, the developpers have said they are trying to recreate the issue by installing from the image file.