Pre-installed Ollama Configuration

scott165 · October 29, 2025, 8:54pm

I would like to add remote access to Ollama. I’m trying to understand how to add environment variables to allow cross origin access. I tried to use sudo systemctl edit ollama, per playbook

I get no files found for ollama.service.

Ollama is located at /snap/bin/ollama

Also where are the models and modelfiles, where is the home directory?

eugr · October 29, 2025, 9:56pm

Do yourself a service and don’t use Ollama. Use llama.cpp or LM Studio (if you prefer GUI) instead. Ollama underperforms on Spark. It introduces unnecessary complexity compared to llama.cpp.

ashreaperone · October 30, 2025, 12:03am

@eugr Do you have a good instruction that shows how to use llama.cpp on spark with cu130 ? I haven’t been looking for one, but no luck. Thanks in advance.

scott165 · October 30, 2025, 12:18am

I solved my issue…use snap set ollama host=“0.0.0.0:11434”

eugr · October 30, 2025, 12:43am

Install development tools:

sudo apt install clang cmake libcurl4-openssl-dev

Checkout llama.cpp

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

Build:

cmake -B build -DGGML_CUDA=ON -DGGML_CURL=ON
cmake --build build --config Release -j 20

Serve gpt-oss-120b

build/bin/llama-server -hf ggml-org/gpt-oss-120b-GGUF \
-fa on -ngl 999 \
--jinja \
--ctx-size 0 \
-b 2048 -ub 2048 \
--no-mmap \
--temp 1.0 \
--top-p 1.0 \
--top-k 0 \
--reasoning-format auto \
--chat-template-kwargs "{\"reasoning_effort\": \"medium\"}"

ashreaperone · October 30, 2025, 12:46am

Thank you!

dngettler · October 30, 2025, 10:29am

If you use Open WebUI with Ollama, you can easily connect that to Tailscale for free and access your Open WebUI/Ollama models from anywhere. It’s really easy to set it up. With the DGX Spark hosting my larger LLMs and my 5090 with the smaller ones, I also combined the two Ollama servers into the same Open WebUI account; again, really easy setup if you ask any of the AI apps.

haidij · November 17, 2025, 4:54pm

Hi - thanks for all your posts, I’ve been reading your updates on different inference engines with interest. I’m trying to decide whether to prioritise llama cpp or vllm so that I can stick with one model format and stack for a bit.

For LLMs I want local chat UI and API access - not a production scenario but wanting to get the best (fastest and highest quality) performance without too many constraints. And to avoid downloading and managing multiple different formats for the same model.

So I know this is a bit of an ‘it depends’ kind of question but would love to hear your current point of view on this one?

eugr · November 17, 2025, 5:18pm

If you are the only user, I’d stick with llama.cpp - it will give you the best single-user performance without too much overhead. It will also give you a bigger variety of quants to choose from (e.g. Q6_K_XL). It can support multi-user too in a pinch.

The only downside is that new model support make take some time, especially if it’s a new architecture. Some models get day 1 support, some take months (Qwen3-Next, Qwen3-VL). But you can use VLLM for those if needed.

My current personal setup is llama.cpp with llama-swap as a proxy to load models on demand. What’s great about llama-swap, is that you can also plug vllm or any other inference engine there if needed.

tbnilles · November 24, 2025, 6:08am

Thank you for this, I was able to compile.

system · December 8, 2025, 6:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inconsistent Official Guides DGX Spark / GB10	5	164	November 30, 2025
Models not using Spark GPU? DGX Spark / GB10 containers	9	156	December 15, 2025
Tutorial: Build llama.cpp from source and run Qwen3 235B DGX Spark / GB10 Projects llama	25	1338	December 12, 2025
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	5	430	December 12, 2025
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	13166	August 28, 2024
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10	18	1145	December 4, 2025
Model Orchestration and Deployment DGX Spark / GB10 nim	4	249	November 24, 2025
Can I use Ollama or vLLM on the GB10 to run multiple LLM models simultaneously DGX Spark / GB10	8	225	December 13, 2025
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	6038	March 20, 2024
LLaMa3.1 required an upgrade to Ollama Jetson Orin NX generative_ai	6	1834	August 28, 2024

Pre-installed Ollama Configuration

Related topics