from llama-server, it’s actually relatively easy to get OpenCode up and running. The only things that really matter are the IP and the port if you have started the LLM via llama-server or vllm. It is worth mentioning that
a) after opencode.json has been generated and adapted;
b) and opencode has been restarted
the LLM is already connected, even if the provider and IP:port are not displayed or are still displayed incorrectly (update?), at least in the GUI version.
A) What will be important for us SPARK users in the future, and this is where NVIDIA or the community would actually be called upon, would be a clean connection to local providers. Especially since the IP:port is currently a provider element, but we can run several LLMs in parallel on the Spark, which are currently addressed via different ports. Perhaps we should engage more in the discourse on this.
B) It would also be interesting to have a list of usefull coding LLMs that utilize Spark’s resources and are particularly suitable for open code (input from experienced coders would be particularly important here).
I got OpenCode working really well with a local LLM on the Spark.
I’m currently running Ollama and a Qwen 30B Instruct variant, and its the best one for tool calling by far for me:
qooba/qwen3-coder-30b-a3b-instruct:q3_k_m
it’s running at 82 tokens per second which is great local coding.
This model handles tool calls extremely well. In my testing, a lot of the “main” base models struggle with tool calling, so make sure you’re grabbing an Instruct variant if you want reliable tool usage.
I run LiteLLM Proxy locally. It solves this problem and many others, like you can set some extra metadata that OpenCode can parse, like context window, image support, prefix caching support, etc. I also use it to route requests to other servers in my network and to models in OpenRouter and even OpenAI and Anthropic ones.
+1 on using LiteLLM Proxy — it really is the cleanest way today to abstract multiple local endpoints and models behind a single API for tools like OpenCode.
Just to add to the discussion, there are also a couple of excellent routing projects worth mentioning that tackle the same problem from complementary angles:
vLLM Semantic Router https://vllm-semantic-router.com
This enables routing requests based on semantic intent, not just a fixed model or port. Very compelling for DGX Spark setups running multiple specialized models in parallel (coding, reasoning, tool-use, etc.).
NVIDIA LLM Router (Blueprint) https://build.nvidia.com/nvidia/llm-router
An official NVIDIA blueprint that points toward a more enterprise-grade orchestration layer, with routing, policies, and fallbacks — clearly aligned with multi-model deployments on hardware like DGX Spark.
Taken together (LiteLLM, Semantic Router, NVIDIA LLM Router), it seems pretty clear that the long-term solution is not having each application manage raw IP:port mappings, but introducing a proper routing + metadata layer between OpenCode and the inference backends.
For anyone running multiple local LLMs on Spark, this kind of architecture quickly becomes essential.