Hi everyone,
I would like to share an open-source project I created to make it easy to use Claude Code with a local or LAN-hosted vLLM server:
Repository: Nachetecabrera/claude-vllm
The project provides a lightweight compatibility proxy between Claude Code and vLLM, including vLLM servers running locally, on another machine in the LAN, or through NVIDIA NGC-based deployments.
Simple installation on Windows, Linux and macOS
A major goal of the project is to avoid complicated manual configuration.
The repository includes native installation scripts for all three major desktop platforms.
Windows
.\install.ps1
Linux and macOS
chmod +x install.sh && ./install.sh
The installation process automatically:
- Adds the project commands to the system
PATH. - Installs and configures both proxy implementations.
- Creates the
claudevllmandclaudevllmdcommands. - Creates the
spcvpandspcvnproxy startup commands. - Supports both the Python/FastAPI and Node.js/Fastify versions.
After restarting the terminal, the same commands can be used on Windows, Linux and macOS.
Basic configuration
The user only needs to copy the example configuration file and specify the vLLM server address.
For example:
{
"listen_ip": "0.0.0.0",
"listen_port": 8010,
"forward_url": "http://VLLM_HOST:8000",
"model": "qwen3-coder-next",
"log_level": "INFO",
"system_mode": "hoist",
"drop_top_level_fields": "context_management,output_config,thinking",
"drop_tool_fields": "strict,defer_loading",
"strip_cache_control": true
}
The important values are:
forward_url: address of the local, LAN, VPN, Tailscale, or NGC vLLM server.model: model name that the proxy should send to vLLM.system_mode: how Claude Code system and developer messages should be normalized.
Claude Code can then be pointed to the local proxy using environment variables.
Windows PowerShell
$env:ANTHROPIC_BASE_URL = "http://localhost:8010"
$env:ANTHROPIC_API_KEY = "dummy"
Linux and macOS
export ANTHROPIC_BASE_URL="http://localhost:8010"
export ANTHROPIC_API_KEY="dummy"
These configuration options and commands are documented in the repositoryβs quick-start section.
Starting the proxy
Once installed, the user does not need to navigate through directories or manually launch Python or Node.js files.
The following global commands are available:
spcvp
Starts the Python/FastAPI proxy.
spcvn
Starts the Node.js/Fastify proxy.
Then Claude Code can be launched through:
claudevllm
Or with permission bypass mode:
claudevllmd
The README specifies that these commands work across Windows, Linux, and macOS after installation.
Why the proxy is useful
Some Claude Code requests are not accepted directly by certain versions of vLLMβs Anthropic-compatible endpoint.
For example, Claude Code may include system or developer roles inside the messages array, while the endpoint expects message roles to be only user or assistant.
This can produce validation errors such as:
Input should be 'user' or 'assistant'
Claude Code may also include fields such as:
context_management
output_config
thinking
cache_control
The proxy normalizes these requests before forwarding them to vLLM.
It can:
- Move
systemanddevelopermessages into the top-levelsystemfield. - Convert system messages into tagged user messages when required.
- Remove unsupported top-level request fields.
- Remove unsupported properties from tool definitions.
- Recursively remove
cache_control. - Force a specific model name.
- Forward an upstream API key.
- Preserve streaming responses.
- Connect to a vLLM server running locally or elsewhere on the network.
The two supported system-message modes are hoist, which is recommended in the README, and user, which converts instructions to messages tagged with <system-update>.
Example architecture
Windows / Linux / macOS client
βββββββββββββββββββββββββββββββββ
β Claude Code β
β claude-vllm proxy β
β Python or Node.js β
βββββββββββββββββ¬ββββββββββββββββ
β
β LAN / VPN / Tailscale
βΌ
βββββββββββββββββββββββββββββββββ
β NVIDIA system β
β vLLM server β
β Local coding model β
βββββββββββββββββββββββββββββββββ
This makes it possible to run Claude Code on a regular Windows, Linux, or macOS computer while the model runs on a separate NVIDIA GPU workstation, server, or DGX system.
Health checks
The proxy also includes health endpoints:
curl http://localhost:8010/healthz
This verifies that the proxy is running.
curl http://localhost:8010/readyz
This verifies both the proxy and its connection to the upstream vLLM server.
Project scope
This is a compatibility proxy rather than a complete Anthropic-to-OpenAI translation layer.
It uses vLLMβs Anthropic-compatible interface and normalizes the portions of Claude Code requests that may otherwise cause validation errors.
The repository includes:
- Windows, Linux, and macOS installers.
- Python/FastAPI implementation.
- Node.js/Fastify implementation.
- Global startup commands.
- Configurable request normalization.
- Local and LAN vLLM support.
- Health and readiness endpoints.
- Environment-variable configuration.
- MIT license.
Feedback, testing, issues, and pull requests are welcome, especially from users running:
- NVIDIA DGX systems.
- NVIDIA GPU workstations.
- NVIDIA NGC vLLM containers.
- Remote vLLM instances over LAN, VPN, or Tailscale.
- Qwen, Nemotron, DeepSeek, and other coding models.
- Claude Code clients on Windows, Linux, or macOS.
Repository: github.com/Nachetecabrera/claude-vllm