vLLM Compatibility Problem with GPT OSS 120B and OpenClaw by spark-vllm-docker

Hi NV member:
I would like to use the script from "GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
" to quickly set up a GPT OSS 120B server so that OpenClaw can access this server to run related services. However, after following the steps below, I encounter a “model not found” issue. Could this be caused by some incompatibility in vLLM?

I also noticed that GitHub - fidecastro/fix_glm46v: A fix for OpenClaw to work with GLM4.5 and GLM4.6V
addresses related issues. Could you please advise how this problem can be fixed?

Thank you.

  1. Setup the vllm + gpt-oss-120b server by the following command and link
    GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks

Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

  1. Setup the Openclaw
    a. Install and config
    Install: Install - OpenClaw
    Config for vllm: Local Models - OpenClaw
    My config file:
    openclaw.txt (2.3 KB)

  2. Have the following error happen when chat in openclaw ui

(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-11 08:06:30 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) ERROR 02-11 08:06:30 [serving_chat.py:236] Error with model error=ErrorInfo(message=‘The model gpt-oss-120b does not exist.’, type=‘NotFoundError’, param=‘model’, code=404)
(APIServer pid=176) INFO: 127.0.0.1:47302 - “POST /v1/chat/completions HTTP/1.1” 404 Not Found

From a cursory look, it seems the APIServer is getting invoked using a model id that differs from the one being served.

Guessing here: vllm is serving openai/gpt-oss-120b but you are referencing simply gpt-oss-120b. Fix: Try updating the model id in the API calls.

Hi @adg1
My OpenClaw configuration is as follows. How should this part be configured for the best results?

},
“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “openai/gpt-oss-120b”,
“name”: “openai/gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

Hello!

I have no experience with OpenClaw, and I cannot be more precise, unfortunately.

That being said, the APIServer logs suggest a model Id mismatch – I may be mistaken.

Now, from your OpenClaw configuration it is not apparent the presence of wrong model identifiers. Hence I wonder whether further identifiers are present elsewhere in the system configuration or code.

I hope the above helps you dig the root cause.

1 Like

When launching with a ./run-recipe.sh set or via /.launch-cluster.sh set the custom arg with --served-model-name gpt-oss-120b

Then in OpenClaw config have the ID be the same “gpt-oss-120b”

    "models": [
      {
        "id": "Keyper-Thinker",
        "name": "Keyper Thinker",
        "reasoning": false,
        "input": [
          "text"
        ],
        "cost": {
          "input": 0,
          "output": 0,
          "cacheRead": 0,
          "cacheWrite": 0
        },
        "contextWindow": 165000,
        "maxTokens": 8192
      }
    ]
./launch-cluster.sh  \
exec vllm serve \
  QuantTrio/MiniMax-M2.1-AWQ \
  --served-model-name Keyper-Thinker \
  --port 8000 \
  --host 0.0.0.0 \
  --gpu-memory-utilization 0.85 \
  -tp 2 \
  --distributed-executor-backend ray \
  --max-model-len 165000 \
  --load-format fastsafetensors \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2 \
  --trust-remote-code
1 Like

Looks like your browser autocorrected double dash to a single one, the argument is --served-model-name.

1 Like

Fixed. Gracias.

1 Like

Hi @Keyper-AI and @eugr :
After adding the --served-model-name gpt-oss-120b argument and updating the OpenClaw ID, I no longer see the “gpt-oss-120b” failure. However, the vLLM server still shows a 400 Bad Request error.

I tested using Ollama with gpt-oss-120b, and it works normally. Is there any specific configuration I should double-check or pay attention to?

Thank you for your help.

  1. Add the --served-model-name gpt-oss-120b to spark-vllm-docker

asus@gx10-9680:~/Desktop/test/eugr/spark-vllm-docker$ git diff
diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \

  • --served-model-name gpt-oss-120b \
     --enable-auto-tool-choice \
     --tensor-parallel-size {tensor_parallel} \
     --distributed-executor-backend ray \
    
  1. Run the following command to enable server
    $ ./run-recipe.sh openai-gpt-oss-120b --solo --setup

  2. Modify the openclaw.json

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt oss 120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

  1. Fail to ask openclaw because the following error

(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=176) WARNING 02-12 02:05:35 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=176) INFO: 127.0.0.1:60938 - “POST /v1/chat/completions HTTP/1.1” 400 Bad Request

You still use openai/gpt-oss-120b here while using just gpt-oss-120b earlier.
And gpt-oss-120b is a reasoning model.

Also, gpt-oss-120b doesn’t support context window over 131072 tokens.

I don’t use openclaw/clawdbot, so the problem might be elsewhere, but I’d fix these things first.

1 Like

Hi @Keyper-AI, @eugr and @adg1
It is now working normally with OpenClaw in the Spark-VLLM-Docker + GPT-OSS-120B environment after applying the following steps and modifications. Thank you for your helpful suggestions and methods.

  1. Setup the vllm + gpt-oss-120b server by the following command and link
    GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
    a. Add the --served-model-name gpt-oss-120b to openai-gpt-oss-120b.yaml

diff --git a/recipes/openai-gpt-oss-120b.yaml b/recipes/openai-gpt-oss-120b.yaml
index 09cfa52..88cbdd6 100644
— a/recipes/openai-gpt-oss-120b.yaml
+++ b/recipes/openai-gpt-oss-120b.yaml
@@ -36,6 +36,7 @@ command: |
vllm serve openai/gpt-oss-120b
–tool-call-parser openai
–reasoning-parser openai_gptoss \

  • –served-model-name gpt-oss-120b
    –enable-auto-tool-choice
    –tensor-parallel-size {tensor_parallel}
    –distributed-executor-backend ray \

b. Full setup: build container + download model + run

./run-recipe.sh openai-gpt-oss-120b --solo --setup

  1. Setup the Openclaw
    a. Install and config
    Install: Install - OpenClaw
    Config for vllm: Local Models - OpenClaw
    My config file:
    openclaw_vllm_gpt-oss-120b_final.txt (3.5 KB)

“models”: {
“providers”: {
“openai”: {
“baseUrl”: “http://127.0.0.1:8000/v1”,
“apiKey”: “OPENAI_KEY”,
“api”: “openai-completions”,
“models”: [
{
“id”: “gpt-oss-120b”,
“name”: “gpt-oss-120b”,
“reasoning”: false,
“input”: [
“text”
],
“cost”: {
“input”: 0,
“output”: 0,
“cacheRead”: 0,
“cacheWrite”: 0
},
“contextWindow”: 200000,
“maxTokens”: 8192
}
]
}
}
},
“agents”: {
“defaults”: {
“model”: {
“primary”: “openai/gpt-oss-120b”
},
“workspace”: “/home/asus/.openclaw/workspace”,
“compaction”: {
“mode”: “safeguard”
},
“maxConcurrent”: 4,
“subagents”: {
“maxConcurrent”: 8
}
}
},

  1. The vllm gpt-oss-120b server can work normally

(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘strict’}
(APIServer pid=175) WARNING 02-12 06:09:14 [protocol.py:117] The following fields were present in the request but ignored: {‘store’}
(APIServer pid=175) INFO: 127.0.0.1:47234 - “POST /v1/chat/completions HTTP/1.1” 200 OK
(APIServer pid=175) INFO 02-12 06:09:16 [loggers.py:257] Engine 000: Avg prompt throughput: 2138.2 tokens/s, Avg generation throughput: 51.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.2%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:26 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%
(APIServer pid=175) INFO 02-12 06:09:36 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 80.3%

2 Likes

BTW, you don’t have to modify the recipe now. You can just add extra vLLM arguments after -- in `run-recipe.sh’, like this:

./run-recipe.sh openai-gpt-oss-120b --solo --setup -- --served-model-name gpt-oss-120b
1 Like

Got it ! I learned something new.
Thank you!

We added it a couple of days ago :)

Haha, I’m really lucky! It’s a great way to make the setup much more convenient.

Hi @eugr :
As you mentioned, when my chat gets longer and longer, perhaps because the context becomes larger and larger, after it is sent to the gpt-oss-120b server, the following error keeps occurring. I assume every model has this issue, right? Or based on your understanding, are there better models or approaches to handle this?
Thank you.

  • Error message:

400 max_tokens must be at least 1, got -17474. (parameter=max_tokens, value=-17474)

The contextPruning setting in OPENCLAW might be a good way to reduce the input of large amounts of contextual content.

Don’t set “maxTokens”: 8192, it’s too low and restricts model to output anything longer than that. Just don’t set it at all. Also reduce your context window parameter in openclaw settings to 131072 - maximum that gpt-oss-120b supports.

Hi @eugr :
Thank you for your suggestion. You’re absolutely right. As you mentioned, the context window should better align with GPT-OSS-120B.

We can experiment later with OpenClaw plus multiple agents (Claude Code), combined with either a single or multiple DGX Spark systems. That would also be an interesting application scenario for DGX Spark edge devices.

Hello Team,

We have a two-node cluster and have been testing several models for use with OpenClaw agents, but we haven’t had much success so far.

Could you please recommend which model has been delivering the best overall performance and user experience?

Thank you.

Hi @rdariolemes
Haha, I’m just a user, but you could list the setup process clearly — that might be more helpful.