DGX Spark + GPT-OSS 120B: runtime with reliable Tools + Strict support (for Roo Code)

siertum · December 25, 2025, 6:56am

Hi everyone,

I’m using Roo Code and I want it to interact fully and correctly with gpt-oss 120B on DGX Spark, including tools/function calling and strict (structured outputs / valid JSON without extra text).

Current issues:

SGLang: tools support is unstable.
vLLM: problems with tools and strict.

Question: what runtime/server can I use to run gpt-oss 120B so that Tools + Strict work properly and consistently with Roo Code (OpenAI-compatible API / structured outputs)?

If you have a working setup, please share:

which runtime/server you’re using,
whether tools + strict work without hacks,
(if possible) minimal launch flags or config.

Thanks.

siertum · December 25, 2025, 7:18am

I was able to get Roo Code + gpt-oss-120B on DGX Spark working reliably (at least for Tools / function calling) using a vLLM container build from:

https://github.com/eugr/spark-vllm-docker

What works

OpenAI-compatible /v1/chat/completions
Tools / function calling works consistently (model returns tool_calls correctly).
Roo Code can drive tool calls as long as it uses tool_choice: "auto" (gpt-oss behavior).

What does not (yet) work “fully”

Strict / structured outputs in the OpenAI sense is not fully supported in this path. If the client sends strict, vLLM logs that it is ignored. For my use case this wasn’t critical—the key point was having stable tool calling.

Why I didn’t use NIM

I tested the NIM gpt-oss-120B container on DGX Spark, but it does not work on GB10 in my environment (looks like GB10 / CC 12.1 support is not enabled in that container build yet). So I couldn’t get a working NIM runtime for gpt-oss-120B on this hardware.

Minimal working launch (vLLM via spark-vllm-docker)

This is the command I’m running (weights already downloaded locally; no re-download):

docker run \
  --privileged \
  --gpus all \
  -it --rm \
  --network=host --ipc=host \
  --shm-size 64g \
  -v "$HOME/models/gpt-oss-120b:/model" \
  -v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
  -v "$HOME/tiktoken_encodings:/tiktoken_encodings" \
  -e HF_HUB_OFFLINE=1 \
  -e TIKTOKEN_ENCODINGS_BASE=/tiktoken_encodings \
  vllm-node \
  vllm serve /model \
    --served-model-name "openai/gpt-oss-120b" \
    --host 0.0.0.0 --port 30000 \
    --gpu-memory-utilization 0.9 \
    --max-model-len 131072 \
    --enable-auto-tool-choice \
    --tool-call-parser openai

Proof of tool calling

A request like this returns a valid tool_calls block (with content: null, which is expected in tool-call turns):

{
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_time",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Bottom line

If your priority is Roo Code + stable Tools/function calling on DGX Spark + gpt-oss-120B, the vLLM runtime from spark-vllm-docker is currently the most practical path I’ve found.

If someone has a NIM image/tag that truly enables GB10 for gpt-oss-120B, I’d be interested to test it, but right now I can’t get NIM working on DGX Spark for this model.

eugr · December 25, 2025, 7:59am

Make sure you pass the following parameters to openai/gpt-oss-120b:

--enable-auto-tool-choice \
--tool-call-parser=openai \
--reasoning-parser=openai_gptoss \

I believe the first one can be omitted now, but the other two will make sure proper tool parsers are used.

I use gpt-oss and minimax-m2 with vllm without any issues.

You can also run gpt-oss-120b with llama.cpp.

siertum · December 25, 2025, 9:45am

Hi.

Thanks a lot for sharing the working setup and the repo — very helpful.

Quick question: does gpt-oss-120B work reliably for you with a “terminal/shell” tool (i.e., when Roo Code calls something like a terminal/exec tool and then reads stdout/stderr)? Specifically, is the model stable at:

actually producing tool_calls when it should, and
then correctly consuming the tool output (terminal stdout/stderr) and using it in the next steps?

I’m asking because in my case Roo Code occasionally reports that “gpt-oss didn’t call any tools” (as if no tool_calls happened at all). The conversation flow doesn’t crash — Roo Code keeps trying to continue the project — but the agent logic becomes unreliable because expected tool usage gets skipped.

P.S. I spent a lot of time trying to get both stable tool calling and OpenAI-style strict/structured outputs (valid JSON, no extra text) working “properly”, but I couldn’t make it work reliably — not with SGLang, and not with the NIM prebuilt containers I tested.
Do you think it’s realistically possible on DGX Spark (GB10 / SM12.1) to get both Tools + Strict working consistently without hacks? My impression is that when people say they have “everything working”, they may be running the models on different servers/hardware rather than DGX Spark (SM12.1).

Any practical pointers (exact vLLM version/build, client-side Roo Code settings, flags, or known limitations) would be much appreciated.

eugr · December 25, 2025, 11:57pm

My experience with Roo has been spotty - at some point it didn’t work well at all. So I used Cline more, although when Roo works, it produces better results. I don’t know if it uses native tool calling or not, but Cline (that Roo was forked from) has it as a separate toggle - and native tool calling improves reliability a lot - haven’t had any problems with tool calling lately. So if there is an option to turn on native tool calling, turn it on. Also make sure the context size is set properly in Roo/Cline.

Anyway, I just tried the latest Roo, and it seems to work fine with command line output too.

Having said that, I’ve just switched to Insiders preview of VS Code - Copilot is now able to talk to any OpenAI compatible endpoint, and I like it the most so far.

siertum · December 26, 2025, 12:44pm

In Roo Code, I already switched to native tools - and actually they’re enabled by default for the OpenAI API. In general, they work well. The only thing that’s a bit inconvenient is that you need to set the model’s reasoning level in the system prompt; otherwise it tends to reason poorly and sometimes produces nonsense.

I also tried the VS Code Insiders preview, and it’s definitely good - but it feels more suitable for editing an existing project. If you want to build an MVP from scratch, you have to “push” it constantly.

With Roo Code, I can just give a clear technical spec where I ask it to deploy the project at the end, run the tests, and only say it’s finished once all tests pass successfully. Then it drives the terminal on its own and keeps fixing things until it gets the final working result.

That said, in the VS Code Insiders preview they somehow managed to avoid stuffing the entire prompt with the full conversation history, and it feels faster. It’s like it sends a concrete task to the model instead of the whole dialogue context. I don’t yet understand how they did it, but what I noticed right away is this: a project that Roo Code built over several hours on a single DGX Spark, VS Code built faster - maybe in about an hour - and subjectively the output quality seemed better. However, I didn’t fully verify it: I didn’t run the project, and I’m basing that impression only on my Python experience.

eugr · December 26, 2025, 5:28pm

You need to turn on “Prompt Caching” checkbox in Roo to achieve this. I feed my models through LiteLLM and have this (and context size) properly reported in the model metadata, so Roo automatically sets this, but when you connect to vLLM directly, you need to set it up by hand.

But VS Code Insiders preview goes a little bit further and sends only relevant parts of long source code files for editing. Speeds up things considerably and avoids LLM “forgetting” some of the existing code.

Topic		Replies	Views
DGX Spark performance DGX Spark / GB10	50	3851	February 27, 2026
Code assist and rag (instruct) in single node DGX Spark / GB10 Projects	2	287	February 14, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3742	March 6, 2026
Solved - running gpt oss 120b with two sparks DGX Spark / GB10 Projects	1	221	March 30, 2026
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10 deepseek	21	3426	January 25, 2026
How are you planning on using your DGX spark? DGX Spark / GB10 Projects	22	2457	February 24, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1539	February 13, 2026
Has anyone been able to get Ostris' AI Toolkit running on DGX Spark? DGX Spark / GB10	22	2715	December 19, 2025
Vibe Coding with NVIDIA DGX Spark DGX Spark / GB10	23	3656	January 25, 2026
Dgx spark benchmark performance DGX Spark / GB10	17	1907	January 4, 2026