OpenCode with llama.cpp - lama-server (and/or vllm)

_cjg · January 15, 2026, 6:40pm

With these instructions and tips

from llama-server, it’s actually relatively easy to get OpenCode up and running. The only things that really matter are the IP and the port if you have started the LLM via llama-server or vllm. It is worth mentioning that
a) after opencode.json has been generated and adapted;
b) and opencode has been restarted
the LLM is already connected, even if the provider and IP:port are not displayed or are still displayed incorrectly (update?), at least in the GUI version.

A) What will be important for us SPARK users in the future, and this is where NVIDIA or the community would actually be called upon, would be a clean connection to local providers. Especially since the IP:port is currently a provider element, but we can run several LLMs in parallel on the Spark, which are currently addressed via different ports. Perhaps we should engage more in the discourse on this.

B) It would also be interesting to have a list of usefull coding LLMs that utilize Spark’s resources and are particularly suitable for open code (input from experienced coders would be particularly important here).

Alexander-F · January 16, 2026, 3:11am

I got OpenCode working really well with a local LLM on the Spark.

I’m currently running Ollama and a Qwen 30B Instruct variant, and its the best one for tool calling by far for me:

qooba/qwen3-coder-30b-a3b-instruct:q3_k_m

it’s running at 82 tokens per second which is great local coding.

This model handles tool calls extremely well. In my testing, a lot of the “main” base models struggle with tool calling, so make sure you’re grabbing an Instruct variant if you want reliable tool usage.

eugr · January 16, 2026, 6:58pm

I run LiteLLM Proxy locally. It solves this problem and many others, like you can set some extra metadata that OpenCode can parse, like context window, image support, prefix caching support, etc. I also use it to route requests to other servers in my network and to models in OpenRouter and even OpenAI and Anthropic ones.

vedcsolution · January 16, 2026, 7:52pm

+1 on using LiteLLM Proxy — it really is the cleanest way today to abstract multiple local endpoints and models behind a single API for tools like OpenCode.

Just to add to the discussion, there are also a couple of excellent routing projects worth mentioning that tackle the same problem from complementary angles:

vLLM Semantic Router
https://vllm-semantic-router.com
This enables routing requests based on semantic intent, not just a fixed model or port. Very compelling for DGX Spark setups running multiple specialized models in parallel (coding, reasoning, tool-use, etc.).
NVIDIA LLM Router (Blueprint)
https://build.nvidia.com/nvidia/llm-router
An official NVIDIA blueprint that points toward a more enterprise-grade orchestration layer, with routing, policies, and fallbacks — clearly aligned with multi-model deployments on hardware like DGX Spark.

Taken together (LiteLLM, Semantic Router, NVIDIA LLM Router), it seems pretty clear that the long-term solution is not having each application manage raw IP:port mappings, but introducing a proper routing + metadata layer between OpenCode and the inference backends.

For anyone running multiple local LLMs on Spark, this kind of architecture quickly becomes essential.

Topic		Replies	Views
Code assist and rag (instruct) in single node DGX Spark / GB10 Projects	2	264	February 14, 2026
Tutorial: Build llama.cpp from source and run Qwen3 235B DGX Spark / GB10 Projects llama	28	4895	January 20, 2026
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10 deepseek	21	2899	January 25, 2026
Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs DGX Spark / GB10 llama , nemotron	37	1814	February 25, 2026
Model Orchestration and Deployment DGX Spark / GB10 nim	4	574	November 24, 2025
Vibe Coding with NVIDIA DGX Spark DGX Spark / GB10	23	3357	January 25, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	89	6393	March 12, 2026
Best LLM engine for several parallel models? DGX Spark / GB10 agentic-ai	6	466	January 6, 2026
Agentic DevOps with DGX Spark ?! DGX Spark / GB10 developer , agentic-ai	3	692	September 2, 2025
Pre-installed Ollama Configuration DGX Spark / GB10	10	1103	November 24, 2025

OpenCode with llama.cpp - lama-server (and/or vllm)

Related topics