I should have been more explicit. The name of the model is wrong.
Both would be nice, but automatic deployment would be preferable - that’s what I meant with the mod or option. E.g. could you add a recipe option, such as –litellm, and the recipe would also setup a LiteLLM proxy.
When configuring LiteLLM manually, for example, initially I only specified one model (Opus), but even though I selected Opus in claude code, haiku and sonnet were called too, and those calls failed as I had not specified them. So you always need to include all three.
You could try my little helper for LiteLLM updates.
It was intended to be run in a docker compose file together with the vLLM/llama.cpp process. Each time you start the whole stack it waits until vLLM is started and then sends an update to a running LiteLLM in database mode. It terminates after the change.
BTW, you don’t need to do it. Just set environment variables for your client. Then all you need is /messages compatible endpoint. E.g. I have this helper script on my Mac:
#!/bin/bash
DEFAULT_BASE_URL="http://spark:8888"
export LITELLM_API_KEY="none"
export ANTHROPIC_BASE_URL="${2:-$DEFAULT_BASE_URL}"
export ANTHROPIC_AUTH_TOKEN="$LITELLM_API_KEY"
export ANTHROPIC_MODEL=$1
export ANTHROPIC_SMALL_FAST_MODEL=$1
export ANTHROPIC_DEFAULT_HAIKU_MODEL=$1
export ANTHROPIC_DEFAULT_SONNET_MODEL="$1"
export ANTHROPIC_DEFAULT_OPUS_MODEL="$1"
export CLAUDE_CODE_ATTRIBUTION_HEADER=0
claude
I can call it like this:
~/claude_local.sh Qwen/Qwen3-Coder-Next-80B-FP8
And it will work. Or like this if I want to go through LiteLLM proxy:
~/claude_local.sh Qwen/Qwen3-Coder-Next-80B-FP8 https://llm-proxy:4000
I’ve followed your lead, and setup a simple litellm proxy, and am currently running it with Qwen3.5-35B-A3B-FP8 just fine! I’ve also created a pull request for a new recipe, allowing others to run it easily too.