vLLM Compatibility Problem with GPT OSS 120B and OpenClaw by spark-vllm-docker

Hi NV member:
I would like to use the script from "GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
" to quickly set up a GPT OSS 120B server so that OpenClaw can access this server to run related services. However, after following the steps below, I encounter a “model not found” issue. Could this be caused by some incompatibility in vLLM?

I also noticed that GitHub - fidecastro/fix_glm46v: A fix for OpenClaw to work with GLM4.5 and GLM4.6V
addresses related issues. Could you please advise how this problem can be fixed?

Thank you.

  1. Setup the vllm + gpt-oss-120b server by the following command and link
    GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks

Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

  1. Setup the Openclaw
    a. Install and config
    Install: Install - OpenClaw
    Config for vllm: Local Models - OpenClaw
    My config file:
    openclaw.txt (2.3 KB)

  2. Have the following error happen when chat in openclaw ui

(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) ERROR 02-11 08:06:30 [serving_chat.py:236] Error with model error=ErrorInfo(message=‘The model gpt-oss-120b does not exist.’, type=‘NotFoundError’, param=‘model’, code=404)
(APIServer pid=176) INFO: 127.0.0.1:47302 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found

From a cursory look, it seems the APIServer is getting invoked using a model id that differs from the one being served.

Guessing here: vllm is serving openai/gpt-oss-120b but you are referencing simply gpt-oss-120b. Fix: Try updating the model id in the API calls.

Hi @adg1
My OpenClaw configuration is as follows. How should this part be configured for the best results?

},
“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “openai/gpt-oss-120b”,
“name”: “openai/gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

Hello!

I have no experience with OpenClaw, and I cannot be more precise, unfortunately.

That being said, the APIServer logs suggest a model Id mismatch – I may be mistaken.

Now, from your OpenClaw configuration it is not apparent the presence of wrong model identifiers. Hence I wonder whether further identifiers are present elsewhere in the system configuration or code.

I hope the above helps you dig the root cause.

1 Like

When launching with a ./run-recipe.sh set or via /.launch-cluster.sh set the custom arg with --served-model-name gpt-oss-120b

Then in OpenClaw config have the ID be the same “gpt-oss-120b”

    "models": [
      {
        "id": "Keyper-Thinker",
        "name": "Keyper Thinker",
        "reasoning": false,
        "input": [
          "text"
        ],
        "cost": {
          "input": 0,
          "output": 0,
          "cacheRead": 0,
          "cacheWrite": 0
        },
        "contextWindow": 165000,
        "maxTokens": 8192
      }
    ]
./launch-cluster.sh  \
exec vllm serve \
  QuantTrio/MiniMax-M2.1-AWQ \
  --served-model-name Keyper-Thinker \
  --port 8000 \
  --host 0.0.0.0 \
  --gpu-memory-utilization 0.85 \
  -tp 2 \
  --distributed-executor-backend ray \
  --max-model-len 165000 \
  --load-format fastsafetensors \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2 \
  --trust-remote-code
1 Like

Looks like your browser autocorrected double dash to a single one, the argument is --served-model-name.

1 Like

Fixed. Gracias.

1 Like

Hi @Keyper-AI and @eugr :
After adding the --served-model-name gpt-oss-120b argument and updating the OpenClaw ID, I no longer see the “gpt-oss-120b” failure. However, the vLLM server still shows a 400 Bad Request error.

I tested using Ollama with gpt-oss-120b, and it works normally. Is there any specific configuration I should double-check or pay attention to?

Thank you for your help.

  1. Add the --served-model-name gpt-oss-120b to spark-vllm-docker

asus@gx10-9680:~/Desktop/test/eugr/spark-vllm-docker$ git diff
diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \

  • --served-model-name gpt-oss-120b \
     --enable-auto-tool-choice \
     --tensor-parallel-size {tensor_parallel} \
     --distributed-executor-backend ray \
    
  1. Run the following command to enable server
    $ ./run-recipe.sh openai-gpt-oss-120b --solo --setup

  2. Modify the openclaw.json

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt oss 120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

  1. Fail to ask openclaw because the following error

(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) INFO: 127.0.0.1:60938 - “POST /v1/chat/completions HTTP/1.1” 400 Bad Request

You still use openai/gpt-oss-120b here while using just gpt-oss-120b earlier.
And gpt-oss-120b is a reasoning model.

Also, gpt-oss-120b doesn’t support context window over 131072 tokens.

I don’t use openclaw/clawdbot, so the problem might be elsewhere, but I’d fix these things first.

1 Like

Hi @Keyper-AI, @eugr and @adg1
It is now working normally with OpenClaw in the Spark-VLLM-Docker + GPT-OSS-120B environment after applying the following steps and modifications. Thank you for your helpful suggestions and methods.

  1. Setup the vllm + gpt-oss-120b server by the following command and link
    GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
    a. Add the --served-model-name gpt-oss-120b to openai-gpt-oss-120b.yaml

diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \

  • –served-model-name gpt-oss-120b
    –enable-auto-tool-choice
    –tensor-parallel-size {tensor_parallel}
    –distributed-executor-backend ray \

b. Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

  1. Setup the Openclaw
    a. Install and config
    Install: Install - OpenClaw
    Config for vllm: Local Models - OpenClaw
    My config file:
    openclaw_vllm_gpt-oss-120b_final.txt (3.5 KB)

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“compaction”: {
“mode”: “safeguard”
},
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

  1. The vllm gpt-oss-120b server can work normally

(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=175) INFO: 127.0.0.1:47234 - “POST /v1/chat/completions HTTP/1.1” 200 OK
(APIServer pid=175) INFO 02-12 06:09:16 [loggers.py:257] Engine 000: Avg prompt throughput: 2138.2 tokens/s, Avg generation throughput: 51.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.2%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:26 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:36 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%

2 Likes

BTW, you don’t have to modify the recipe now. You can just add extra vLLM arguments after -- in `run-recipe.sh’, like this:

./run-recipe.sh openai-gpt-oss-120b --solo --setup -- --served-model-name gpt-oss-120b
1 Like

Got it ! I learned something new.
Thank you!

We added it a couple of days ago :)

Haha, I’m really lucky! It’s a great way to make the setup much more convenient.

Hi @eugr :
As you mentioned, when my chat gets longer and longer, perhaps because the context becomes larger and larger, after it is sent to the gpt-oss-120b server, the following error keeps occurring. I assume every model has this issue, right? Or based on your understanding, are there better models or approaches to handle this?
Thank you.

  • Error message:

400 max_tokens must be at least 1, got -17474. (parameter=max_tokens, value=-17474)

The contextPruning setting in OPENCLAW might be a good way to reduce the input of large amounts of contextual content.

Don’t set “maxTokens”: 8192, it’s too low and restricts model to output anything longer than that. Just don’t set it at all. Also reduce your context window parameter in openclaw settings to 131072 - maximum that gpt-oss-120b supports.

Hi @eugr :
Thank you for your suggestion. You’re absolutely right. As you mentioned, the context window should better align with GPT-OSS-120B.

We can experiment later with OpenClaw plus multiple agents (Claude Code), combined with either a single or multiple DGX Spark systems. That would also be an interesting application scenario for DGX Spark edge devices.