Jetson thor: run qwen2.5vl by ollama can't on GPU, only cpu

957728106 · September 9, 2025, 9:46am

I’m trying VLM by ollama, base on the web" Tags · qwen2.5vl ", we can try ollama server + webUI + qwen2.5vl;

1、 Install the ollama server, the cmd is:
curl-fsSL https://ollama.com/install.sh / sh
sudo systemctl start ollama

2、 Install Open WebUI base by docker:
docker run -d
–name open-webui
-p 3000:8080
-e OLLAMA_API_BASE_URL=http://host.docker.internal:11434
-v /var/run/docker.sock:/var/run/docker.sock
–gpus all
ghcr.io/open-webui/open-webui:main

3、 start the VLM experiment
a、download the model: ollama run qwen2.5vl:3b
b、vim Modelfile:
FROM qwen2.5vl:3b
PARAMETER num_ctx 512
PARAMETER num_gpu 1
PARAMETER temperature 0.7
SYSTEM “你是一个支持中文的多模态 AI 助手”
c、run: ollama create qwen2.5vl:3b -f Modelfile
d、test：ollama run qwen2.5vl:3b “你好”

  but， the result is:

if I modify the Modelfile as: PARAMETER num_gpu 1  ----> 0,  it's work,

but it’s not I wanted, it’s working on cpu, not GPU

  Please help me confirm what is wrong with my environment, thanks;

jameskuo · September 9, 2025, 10:01am

Hi:
You might need to use nvidia-smi to monitor GPU usage, as described in Thor GPU can not detected - #5 by AastaLLL

957728106 · September 9, 2025, 10:06am

please see other my question: Jetson thor: nvidia-smi show Nvidia thor off

jameskuo · September 9, 2025, 10:10am

What about watch -n 0.1 nvidia-smi?
Run nvidia-smi will check the usage just once. Use watch to check it every 0.1 second.

957728106 · September 9, 2025, 10:12am

AastaLLL · September 9, 2025, 2:49pm

Hi,

Could you check if your Ollama has CUDA support? (libggml-cuda.so)
Or you can install it from the steps shared in the link below:

Thanks.

957728106 · September 10, 2025, 7:42am

I follow the link step, only get libggml-cuda.so/libggml-cpu.so/libggml-base.so file , but no have the file “ollama”:

But, I pull docker image : “docker pull Package ollama · GitHub”
In the docker, I get the /opt/ollama files;

so In this way, I finally solved the problem that ollama compiles the source code without ollama executable program；

thanks for your answer， And share my solution with others；

system · October 8, 2025, 12:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	5865	March 20, 2024
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	12809	August 28, 2024
Jetson thor: nvidia-smi show Nvidia thor off Jetson Thor cuda	4	206	September 23, 2025
Ollama is running slow on Jetson AGX Orin Dev-kit (32G) Jetson AGX Orin generative_ai	2	1208	February 29, 2024
Run llm stuck while use jetson thor Jetson Thor cuda , generative_ai	7	214	September 25, 2025
Ollama on Docker does not finmd GPU Jetson Orin Nano generative_ai	4	1636	March 5, 2025
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	25354	May 10, 2024
How to run local llm with cuda 10.2 support Jetson Nano generative_ai	5	1761	May 22, 2024
LLMs token/sec Jetson AGX Orin generative_ai	2	1134	April 8, 2024
Ollama run Gives: Error-GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error" Jetson AGX Orin cuda	3	2351	May 15, 2024

Jetson thor: run qwen2.5vl by ollama can't on GPU, only cpu

Related topics