vLLM Compatibility Problem with GPT OSS 120B and OpenClaw by spark-vllm-docker

Turtle7777 · February 11, 2026, 8:53am

Hi NV member:
I would like to use the script from "GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
" to quickly set up a GPT OSS 120B server so that OpenClaw can access this server to run related services. However, after following the steps below, I encounter a “model not found” issue. Could this be caused by some incompatibility in vLLM?

I also noticed that GitHub - fidecastro/fix_glm46v: A fix for OpenClaw to work with GLM4.5 and GLM4.6V
addresses related issues. Could you please advise how this problem can be fixed?

Thank you.

Setup the vllm + gpt-oss-120b server by the following command and link
GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks

Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

Setup the Openclaw
a. Install and config
Install: Install - OpenClaw
Config for vllm: Local Models - OpenClaw
My config file:
openclaw.txt (2.3 KB)
Have the following error happen when chat in openclaw ui

(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) ERROR 02-11 08:06:30 [serving_chat.py:236] Error with model error=ErrorInfo(message=‘The model gpt-oss-120b does not exist.’, type=‘NotFoundError’, param=‘model’, code=404)
(APIServer pid=176) INFO: 127.0.0.1:47302 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found

adg1 · February 11, 2026, 9:04am

From a cursory look, it seems the APIServer is getting invoked using a model id that differs from the one being served.

Guessing here: vllm is serving openai/gpt-oss-120b but you are referencing simply gpt-oss-120b. Fix: Try updating the model id in the API calls.

Turtle7777 · February 11, 2026, 9:48am

Hi @adg1
My OpenClaw configuration is as follows. How should this part be configured for the best results?

},
“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “openai/gpt-oss-120b”,
“name”: “openai/gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

adg1 · February 11, 2026, 11:06am

Hello!

I have no experience with OpenClaw, and I cannot be more precise, unfortunately.

That being said, the APIServer logs suggest a model Id mismatch – I may be mistaken.

Now, from your OpenClaw configuration it is not apparent the presence of wrong model identifiers. Hence I wonder whether further identifiers are present elsewhere in the system configuration or code.

I hope the above helps you dig the root cause.

Keyper-AI · February 11, 2026, 8:20pm

When launching with a ./run-recipe.sh set or via /.launch-cluster.sh set the custom arg with --served-model-name gpt-oss-120b

Then in OpenClaw config have the ID be the same “gpt-oss-120b”

    "models": [
      {
        "id": "Keyper-Thinker",
        "name": "Keyper Thinker",
        "reasoning": false,
        "input": [
          "text"
        ],
        "cost": {
          "input": 0,
          "output": 0,
          "cacheRead": 0,
          "cacheWrite": 0
        },
        "contextWindow": 165000,
        "maxTokens": 8192
      }
    ]

./launch-cluster.sh  \
exec vllm serve \
  QuantTrio/MiniMax-M2.1-AWQ \
  --served-model-name Keyper-Thinker \
  --port 8000 \
  --host 0.0.0.0 \
  --gpu-memory-utilization 0.85 \
  -tp 2 \
  --distributed-executor-backend ray \
  --max-model-len 165000 \
  --load-format fastsafetensors \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2 \
  --trust-remote-code

eugr · February 11, 2026, 8:30pm

Looks like your browser autocorrected double dash to a single one, the argument is --served-model-name.

Keyper-AI · February 11, 2026, 8:31pm

Fixed. Gracias.

Turtle7777 · February 12, 2026, 2:19am

Hi @Keyper-AI and @eugr :
After adding the --served-model-name gpt-oss-120b argument and updating the OpenClaw ID, I no longer see the “gpt-oss-120b” failure. However, the vLLM server still shows a 400 Bad Request error.

I tested using Ollama with gpt-oss-120b, and it works normally. Is there any specific configuration I should double-check or pay attention to?

Thank you for your help.

Add the --served-model-name gpt-oss-120b to spark-vllm-docker

asus@gx10-9680:~/Desktop/test/eugr/spark-vllm-docker$ git diff
diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \
--served-model-name gpt-oss-120b \
 --enable-auto-tool-choice \
 --tensor-parallel-size {tensor_parallel} \
 --distributed-executor-backend ray \

Run the following command to enable server
$ ./run-recipe.sh openai-gpt-oss-120b --solo --setup
Modify the openclaw.json

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt oss 120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

Fail to ask openclaw because the following error

(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) INFO: 127.0.0.1:60938 - “POST /v1/chat/completions HTTP/1.1” 400 Bad Request

eugr · February 12, 2026, 2:35am

You still use openai/gpt-oss-120b here while using just gpt-oss-120b earlier.
And gpt-oss-120b is a reasoning model.

Also, gpt-oss-120b doesn’t support context window over 131072 tokens.

I don’t use openclaw/clawdbot, so the problem might be elsewhere, but I’d fix these things first.

Turtle7777 · February 12, 2026, 6:43am

Hi @Keyper-AI, @eugr and @adg1
It is now working normally with OpenClaw in the Spark-VLLM-Docker + GPT-OSS-120B environment after applying the following steps and modifications. Thank you for your helpful suggestions and methods.

Setup the vllm + gpt-oss-120b server by the following command and link
GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
a. Add the --served-model-name gpt-oss-120b to openai-gpt-oss-120b.yaml

diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \

–served-model-name gpt-oss-120b
–enable-auto-tool-choice
–tensor-parallel-size {tensor_parallel}
–distributed-executor-backend ray \

b. Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

Setup the Openclaw
a. Install and config
Install: Install - OpenClaw
Config for vllm: Local Models - OpenClaw
My config file:
openclaw_vllm_gpt-oss-120b_final.txt (3.5 KB)

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“compaction”: {
“mode”: “safeguard”
},
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

The vllm gpt-oss-120b server can work normally

(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=175) INFO: 127.0.0.1:47234 - “POST /v1/chat/completions HTTP/1.1” 200 OK
(APIServer pid=175) INFO 02-12 06:09:16 [loggers.py:257] Engine 000: Avg prompt throughput: 2138.2 tokens/s, Avg generation throughput: 51.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.2%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:26 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:36 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%

eugr · February 12, 2026, 6:46am

BTW, you don’t have to modify the recipe now. You can just add extra vLLM arguments after -- in `run-recipe.sh’, like this:

./run-recipe.sh openai-gpt-oss-120b --solo --setup -- --served-model-name gpt-oss-120b

Turtle7777 · February 12, 2026, 6:49am

Got it ! I learned something new.
Thank you!

eugr · February 12, 2026, 7:11am

We added it a couple of days ago :)

Turtle7777 · February 12, 2026, 7:14am

Haha, I’m really lucky! It’s a great way to make the setup much more convenient.

Turtle7777 · February 12, 2026, 8:27am

Hi @eugr :
As you mentioned, when my chat gets longer and longer, perhaps because the context becomes larger and larger, after it is sent to the gpt-oss-120b server, the following error keeps occurring. I assume every model has this issue, right? Or based on your understanding, are there better models or approaches to handle this?
Thank you.

Error message:

400 max_tokens must be at least 1, got -17474. (parameter=max_tokens, value=-17474)

Turtle7777 · February 12, 2026, 9:12am

The contextPruning setting in OPENCLAW might be a good way to reduce the input of large amounts of contextual content.

contextPruning:
Configuration Reference - OpenClaw

eugr · February 12, 2026, 5:18pm

Don’t set “maxTokens”: 8192, it’s too low and restricts model to output anything longer than that. Just don’t set it at all. Also reduce your context window parameter in openclaw settings to 131072 - maximum that gpt-oss-120b supports.

Turtle7777 · February 13, 2026, 2:27am

Hi @eugr :
Thank you for your suggestion. You’re absolutely right. As you mentioned, the context window should better align with GPT-OSS-120B.

We can experiment later with OpenClaw plus multiple agents (Claude Code), combined with either a single or multiple DGX Spark systems. That would also be an interesting application scenario for DGX Spark edge devices.

Topic		Replies	Views
GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? DGX Spark / GB10	89	3245	February 13, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	3799	December 9, 2025
Has anyone had any good experience running on DGX Spark with clawdbot? DGX Spark / GB10	36	2330	February 16, 2026
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	48	2812	February 13, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	1589	December 25, 2025
DGX Spark performance DGX Spark / GB10	49	2139	February 13, 2026
Vllm on spark cluster starts and loads model but API not running? DGX Spark / GB10	9	603	December 1, 2025
Make GLM-4.7-Flash go BRRRRR DGX Spark / GB10	17	1131	February 5, 2026
Day 1 with DGX Spark (Asus version) DGX Spark / GB10	29	1407	February 7, 2026
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	510	February 13, 2026

vLLM Compatibility Problem with GPT OSS 120B and OpenClaw by spark-vllm-docker

Full setup: build container + download model + run

Related topics