Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2)

zixian.pang · October 6, 2025, 3:50pm

Which version of ollama did you use? I updated to the newest then the problem disappeared for some of the models.

neevok · October 7, 2025, 7:21am

For testing purposes I have left things as stock as possible.

neevok · October 9, 2025, 7:39am

ollama version is 0.12.3 running only in Docker version 28.4.0, build d8eb465 and not installed natively

zixian.pang · October 9, 2025, 2:47pm

Check this bro. I reverted to 0.10.0 and the issue is solved

github.com/ollama/ollama

llama3.2 responses are gibberish - ggggggggggggg

opened 10:04AM - 01 Sep 25 UTC

closed 07:57AM - 04 Sep 25 UTC

R1U2

bug

### What is the issue? Deepseek r1 gives normal responses. I read on redit that… qwen 2,5 used to answer with ggggggg, The fix was to enable flash attention in the llama.cpp file, but now with the new version where is enabled by default my llama3.2:latest and b models respond only with GGGGGGGGGGGGGGGGGGG. i will roll back my Ollama instace in docker and see if this presists. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_

neevok · October 9, 2025, 8:06pm

I will check this out and will modify the method in the tutorial https://www.youtube.com/watch?v=R0PjKr4d-gU with the command:

docker run -d --runtime nvidia -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama:0.10.0

The full instruction from a stock install of the official orin nano OS (jetpack/ubuntu 22.04) would be:

sudo usermod -aG docker $USER

newgrp docker

docker run -d --runtime nvidia -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama:0.10.0

docker exec -it ollama ollama run llama3.2:3b

neevok · October 9, 2025, 8:33pm

I am testing with one of the models now and it seems promising. the llm does not seem lobotomised anymore, is not forgetting things it should know and so far no weird outputs. I will test other models later.

I still hope the original problem is solved as having to use an older ollama version means some models might not work…

neevok · October 9, 2025, 8:56pm

update:

the qwen2.5:7b llm has improved, but is still exhibiting weird behaviour.

It now outputs:

```plaintext

and goes into a loop.

the loop is interruptable and the scenario can proceed then, but it soon happens again

neevok · October 11, 2025, 8:48am

sigh, I will have to restore my backup again as after a docker update, the ollama running in docker no longer sees the gpu.

I’ve found out that “–runtime nvidia was deprecated after Docker 20.10; --gpus is the supported way to expose GPUs.“ Which is weird as the original tutorial made a point of explicitly changing the –gpus to the –runtime!

So to have this specialised SBC do what it is supposed to do, we now apparently have to use both an older ollama and an older docker! That is NOT a solution…

neevok · October 13, 2025, 5:24pm

I will have to restore my backup and forbid any Nvidia/CUDA updates as they make ollama in a docker unusable:

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

I even tried installing ollama natively again, but that ALSO gives an error. (The same cuda error as well as an error that might be related to the ollama snap not being to find stuff)

Nvidia please provide a fix as this makes the board unusable or unupdatable….

UPDATE: 1

Apparently after rebooting again, some more steps of the updates were implemented (even though there was no message about a reboot being required). After those the native install seems to work. I will try the docker install too after the update.

Update 2:

I tried the docker install. That lead to the error message: Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer

But the weird thing is, I tried the docker exec command again for some reason and now it WORKED!

However, when e4xiting one llm model and running another, the CUDA0 error returns. The command ‘Docker restart ollama’ works sometimes, but I had to log out and log in again at other times and even then got the error “Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory“ which only disappeared after running the command ‘Docker restart ollama’ again.

Another error sometimes crops up alongside the CUDA0 error: “llama_model_load_from_file_impl: failed to load model“

Weird things are happening with this update!

mpperkins · October 15, 2025, 9:23pm

I have the same issue that popped up after an update. I have re-flashed using the SDK-Manager twice and even reverted to Jetpack 6.1.1 with the same results using Ollama native. It was working well with up to 4gb models but now won’t even run the smallest 1.5 gb model. I wonder what broke?

neevok · October 16, 2025, 8:16am

outside docker only tinyllama works and gemma:2b seems to big and gives errors

inside docker things deteriorated: docker exec -it ollama ollama run gemma2:2b
Error response from daemon: container X is not running

It seems the only way to have this working is to return to the stock OS image, disable updates and use old versions and even then there is some weird behaviour in some llms…. This is NOT what I imagined would be using this board when I bought it!

mpperkins · October 16, 2025, 2:57pm

I tried doing just that by going back but I was using the SDKManager to reflash to 6.1.1 which originally worked well and I could run models up to 4b in size. It appears that something in the Ubuntu kernel might be causing how the ngpu is addressed. I will try the original sd card with the NVME drive and see what happens. I agree that it is a shame that the jetpack updates wrecked everything!

zixian.pang · October 20, 2025, 3:02pm

I’m using qwen2.5:7b-instruct and haven’t experienced such problem through lots of tests

zixian.pang · October 20, 2025, 3:03pm

If 0.10.0 works what would be the main concern to still run ollama in a docker?

neevok · October 20, 2025, 3:59pm

that the update makes it work extremely unreliably

neevok · October 20, 2025, 4:00pm

are you using a swap file? if so, how did you set it up?

zixian.pang · October 20, 2025, 4:19pm

No, just install it natively without docker or anything, while my device is Orin 64G but not nano

neevok · October 20, 2025, 4:31pm

ok, so it might be limited to the orin nano. perhaps that can help devs to solve this

zixian.pang · October 28, 2025, 3:45pm

Bro do you have any update? Just as you said, I happened to need to run a new model and 0.10.0 doesn’t support it

neevok · November 1, 2025, 7:41am

in the “Ollama errors orin nano“ thread, the developpers have said they are trying to recreate the issue by installing from the image file.

Topic		Replies	Views
Ollama errors orin nano Jetson Orin NX nvbugs , generative_ai	42	2032	February 12, 2026
Updating Orin Nano breaks Ollama Jetson Orin Nano cuda , generative_ai	26	1299	December 11, 2025
"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages Jetson Orin Nano cuda , jetson , generative_ai , llama	237	13196	March 2, 2026
CUDA0 Buffer Error Jetson Orin Nano cuda , jetson , llama	4	276	January 5, 2026
Orin Nano - Ollama does not run Jetson Orin Nano generative_ai	3	278	October 28, 2025
Gemma3:4b not using the gpu while gemma3:1b does on orin Jetson Nano super Jetson Orin Nano generative_ai , llama	2	592	June 2, 2025
Cuda0 Buffer Error Jetson Orin Nano cuda	12	1386	November 5, 2025
@Dusty_nv has anyone managed to get Ollama running with llama3.2-vision yet? Jetson AGX Orin cuda , generative_ai , llama	7	668	December 28, 2024
Cannot get Ollama running on Jetpak 6.2.1 in native install and had no luck loading a container Jetson Orin Nano generative_ai	3	316	September 5, 2025
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	26141	May 10, 2024

Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2)

Related topics