Ollama in docker causing graphics exceptions and bad responses

I’ve run Ollama on a Mac with reasonable results but was looking to speed it up by moving it to a freshly purchased Jetson Thor AGX. I’ve followed a guide jetson-ai-lab.com and have it running. Specifically I’m using the following docker command line:

docker run --runtime nvidia --gpus all -it -v ${HOME}/oll
ama-data:/data ``ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04

Occasionally the ollama client will receive nonsense data back (usually rendered as random non Latin characters) and when I look at the kernel log I see:

[ 789.838964] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: SKEDCHECK36_DEPENDENCE_COUNTER_UNDERFLOW failed
[ 789.844159] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x407020=0x100 0x407028=0x3f 0x40702c=0x1200a024 0x407030=0x7
[ 789.850578] NVRM: Xid (PCI:0000:01:00): 13, pid=4012, name=ollama, Graphics Exception: channel 0x00000008, Class 0000cdc0, Offset 00000000, Data 00000000

Anyone got any ideas how to diagnose this?

Lawrence

Hi,

This is a known issue and our internal team is working on debugging this.
Please find more information below:

Thanks.

Thanks for getting back to me.

Just to ensure that this is indeed the same problem I’ve attached my own ollama.log.

ollama.log (32.3 KB)

Do you have any timeframes on when a fix might be prepared? Are there any workarounds? Is there anything I can do to mitigate the problem? And lastly, are other LLM “engines” impacted with this problem?

Lawrence

So I’ve realised that Ollama runs fine outside of docker, contrary to what I read on jetson-ai-lab.com. But per Issue: Ollama 0.12.10 fails on NVIDIA Jetson Thor (Regression from 0.12.9) · Issue #13033 · ollama/ollama · GitHub I’m using 0.12.9, which is a couple of revs behind.

Obviously running inside docker would probably be better, but since (at least at the moment) ollama is the only application running on my Jetson, that’s fine.

Lawrence

Hi,

We see the same dmeg error with the topic shared above:

[  146.285137] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: SKEDCHECK36_DEPENDENCE_COUNTER_UNDERFLOW failed

Is your issue fixed by running ollama natively?
In our testing, the issue fails with a rate so you might not meet the failure every time.

Thanks.

Hi,

Since I’ve switched to running ollama 0.12.9 (following that bug report I linked above) outside of docker I’ve not had that kernel message, nor any unexpected output characters from ollama. I’ve not tried the latest ollama.

I’ve just run a benchmark (GitHub - bykologlu/ollama-benchmark-cli: Benchmark tool for evaluating local LLMs like Mistral and DeepSeek using custom prompts.) and it’s not generated any kernel messages nor generated any unusual (for my locale) characters, so I’m pretty certain it will be reliable.

Lawrence

Hi,

Thanks for the feedback.
Good to know you found a way to solve the issue.

Hi,

I have for Ollama yeah. Am very, very happy. But I’ve still not had any luck whatsoever running anything else in Docker. Eg:

jetson-containers run $(autotag stable-diffusion-webui)

Results in:

...
Python 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0]
Version: v1.10.1
Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Traceback (most recent call last):
  File "/opt/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/opt/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/opt/stable-diffusion-webui/modules/launch_utils.py", line 387, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

I shall continue digging.

Cheers and have a great day,

Lawrence

Hi,

Since Thor uses the SBSA GPU driver, the r36-based container cannot work without rebuilding.
Please find below the container that supports Thor instead:
Community: Packages · NVIDIA AI IOT · GitHub
vLLM: vLLM | NVIDIA NGC
SGLang: SGLang | NVIDIA NGC

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.