Sure, here’s a log that contains the timestamps of chat messages, as well as the error. The screenshot shows the conversation itself. Do note that Open WebUI sends a few messages in quick succession for each user message, since it also asks for a title to the conversation, and tags for it as well.
What I did was soft-reboot Jetson Thor, then log in remotely via SSH on two separate connections, without an active Gnome session. In the first SSH connection, I ran btop to confirm that I’m starting with ~110 GB of free RAM. In the second SSH connection, I ran these commands:
cd /opt/stacks/ollama/
docker compose up -d
echo 0 | sudo tee /proc/sys/vm/nr_hugepages
docker logs ollama --follow
Here are the logs, including API call logs:
time=2025-10-21T20:18:09.735Z level=INFO source=routes.go:1511 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-21T20:18:09.737Z level=INFO source=images.go:522 msg="total blobs: 5"
time=2025-10-21T20:18:09.737Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-10-21T20:18:09.737Z level=INFO source=routes.go:1564 msg="Listening on [::]:11434 (version 0.12.6)"
time=2025-10-21T20:18:09.738Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
time=2025-10-21T20:18:10.552Z level=INFO source=types.go:112 msg="inference compute" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA compute=11.0 name=CUDA0 description="NVIDIA Thor" libdirs=ollama,cuda_v13 driver=13.0 pci_id=01:00.0 type=iGPU total="122.8 GiB" available="119.6 GiB"
time=2025-10-21T20:20:06.661Z level=INFO source=server.go:216 msg="enabling flash attention"
time=2025-10-21T20:20:06.661Z level=INFO source=server.go:400 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3 --port 36481"
time=2025-10-21T20:20:06.662Z level=INFO source=server.go:676 msg="loading model" "model layers"=37 requested=-1
time=2025-10-21T20:20:06.662Z level=INFO source=server.go:682 msg="system memory" total="122.8 GiB" free="119.1 GiB" free_swap="0 B"
time=2025-10-21T20:20:06.662Z level=INFO source=server.go:690 msg="gpu memory" id=GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 library=CUDA available="118.5 GiB" free="119.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-10-21T20:20:06.674Z level=INFO source=runner.go:1332 msg="starting ollama engine"
time=2025-10-21T20:20:06.679Z level=INFO source=runner.go:1367 msg="Server listening on 127.0.0.1:36481"
time=2025-10-21T20:20:06.684Z level=INFO source=runner.go:1205 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:14 GPULayers:37[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-21T20:20:06.765Z level=INFO source=ggml.go:134 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=471 num_key_values=30
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes, ID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2025-10-21T20:20:06.854Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 CPU.1.NEON=1 CPU.1.ARM_FMA=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-10-21T20:20:07.186Z level=INFO source=runner.go:1205 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:14 GPULayers:37[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-21T20:20:08.707Z level=INFO source=runner.go:1205 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:8192 KvCacheType: NumThreads:14 GPULayers:37[ID:GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-10-21T20:20:08.707Z level=INFO source=ggml.go:480 msg="offloading 36 repeating layers to GPU"
time=2025-10-21T20:20:08.707Z level=INFO source=ggml.go:487 msg="offloading output layer to GPU"
time=2025-10-21T20:20:08.707Z level=INFO source=ggml.go:492 msg="offloaded 37/37 layers to GPU"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:206 msg="model weights" device=CUDA0 size="59.8 GiB"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:211 msg="model weights" device=CPU size="1.1 GiB"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:217 msg="kv cache" device=CUDA0 size="450.0 MiB"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:228 msg="compute graph" device=CUDA0 size="129.8 MiB"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:233 msg="compute graph" device=CPU size="5.6 MiB"
time=2025-10-21T20:20:08.707Z level=INFO source=device.go:238 msg="total memory" size="61.4 GiB"
time=2025-10-21T20:20:08.707Z level=INFO source=sched.go:482 msg="loaded runners" count=1
time=2025-10-21T20:20:08.707Z level=INFO source=server.go:1272 msg="waiting for llama runner to start responding"
time=2025-10-21T20:20:08.708Z level=INFO source=server.go:1306 msg="waiting for server to become available" status="llm server loading model"
time=2025-10-21T20:20:26.038Z level=INFO source=server.go:1310 msg="llama runner started in 19.38 seconds"
[GIN] 2025/10/21 - 20:20:28 | 200 | 22.474630397s | 192.168.2.18 | POST "/api/chat"
[GIN] 2025/10/21 - 20:20:34 | 200 | 6.497649123s | 192.168.2.18 | POST "/api/chat"
[GIN] 2025/10/21 - 20:20:40 | 200 | 5.527840289s | 192.168.2.18 | POST "/api/chat"
[GIN] 2025/10/21 - 20:20:44 | 200 | 4.202008031s | 192.168.2.18 | POST "/api/chat"
panic: failed to sample token
goroutine 981 [running]:
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0x40002370e0, {0x1c5, {0xaaaae3b130e0, 0x40002e4000}, {0xaaaae3b1dfa8, 0x4000a76df8}, {0x4000b00008, 0x1, 0x1}, {{0xaaaae3b1dfa8, ...}, ...}, ...})
github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x138c
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 38
github.com/ollama/ollama/runner/ollamarunner/runner.go:432 +0x22c
[GIN] 2025/10/21 - 20:21:06 | 500 | 1.502073126s | 192.168.2.18 | POST "/api/chat"
The screenshot shows the conversation from Open WebUI (not hosted on Thor):
Logs from the Open WebUI container (on a different machine):
2025-10-21 20:20:04.645 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/tools/ HTTP/1.1" 200
2025-10-21 20:20:04.713 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:20:04.754 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:20:04.770 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/v1/chats/af1101e4-bad0-4fe7-adb3-0e15048ac09a HTTP/1.1" 200
2025-10-21 20:20:04.804 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:20:04.852 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:20:05.865 | INFO | httpx._client:_send_single_request:1025 - HTTP Request: GET http://qdrant:6333/collections/open-webui_memories/exists "HTTP/1.1 200 OK"
2025-10-21 20:20:05.867 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/chat/completions HTTP/1.1" 200
2025-10-21 20:20:05.907 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:20:05.946 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:20:24.249 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /_app/version.json HTTP/1.1" 200
2025-10-21 20:20:24.327 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/af1101e4-bad0-4fe7-adb3-0e15048ac09a/tags HTTP/1.1" 200
2025-10-21 20:20:28.607 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/chat/completed HTTP/1.1" 200
2025-10-21 20:20:28.649 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/v1/chats/af1101e4-bad0-4fe7-adb3-0e15048ac09a HTTP/1.1" 200
2025-10-21 20:20:28.690 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:20:28.729 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:20:40.790 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:20:40.845 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:20:45.084 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/af1101e4-bad0-4fe7-adb3-0e15048ac09a HTTP/1.1" 200
2025-10-21 20:20:45.134 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/all/tags HTTP/1.1" 200
2025-10-21 20:21:04.815 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/v1/chats/af1101e4-bad0-4fe7-adb3-0e15048ac09a HTTP/1.1" 200
2025-10-21 20:21:04.964 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:21:05.005 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200
2025-10-21 20:21:05.136 | INFO | httpx._client:_send_single_request:1025 - HTTP Request: GET http://qdrant:6333/collections/open-webui_memories/exists "HTTP/1.1 200 OK"
2025-10-21 20:21:05.137 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "POST /api/chat/completions HTTP/1.1" 200
2025-10-21 20:21:05.260 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/chats/?page=1 HTTP/1.1" 200
2025-10-21 20:21:05.300 | INFO | uvicorn.protocols.http.httptools_impl:send:476 - 172.22.0.1:0 - "GET /api/v1/folders/ HTTP/1.1" 200