Claude Code + VLLM on nvcr.io/nvidia/vllm

Hi everyone,

I would like to share an open-source project I created to make it easy to use Claude Code with a local or LAN-hosted vLLM server:

Repository: Nachetecabrera/claude-vllm

The project provides a lightweight compatibility proxy between Claude Code and vLLM, including vLLM servers running locally, on another machine in the LAN, or through NVIDIA NGC-based deployments.

Simple installation on Windows, Linux and macOS

A major goal of the project is to avoid complicated manual configuration.

The repository includes native installation scripts for all three major desktop platforms.

Windows

.\install.ps1

Linux and macOS

chmod +x install.sh && ./install.sh

The installation process automatically:

  • Adds the project commands to the system PATH.
  • Installs and configures both proxy implementations.
  • Creates the claudevllm and claudevllmd commands.
  • Creates the spcvp and spcvn proxy startup commands.
  • Supports both the Python/FastAPI and Node.js/Fastify versions.

After restarting the terminal, the same commands can be used on Windows, Linux and macOS.

Basic configuration

The user only needs to copy the example configuration file and specify the vLLM server address.

For example:

{
  "listen_ip": "0.0.0.0",
  "listen_port": 8010,
  "forward_url": "http://VLLM_HOST:8000",
  "model": "qwen3-coder-next",
  "log_level": "INFO",
  "system_mode": "hoist",
  "drop_top_level_fields": "context_management,output_config,thinking",
  "drop_tool_fields": "strict,defer_loading",
  "strip_cache_control": true
}

The important values are:

  • forward_url: address of the local, LAN, VPN, Tailscale, or NGC vLLM server.
  • model: model name that the proxy should send to vLLM.
  • system_mode: how Claude Code system and developer messages should be normalized.

Claude Code can then be pointed to the local proxy using environment variables.

Windows PowerShell

$env:ANTHROPIC_BASE_URL = "http://localhost:8010"
$env:ANTHROPIC_API_KEY = "dummy"

Linux and macOS

export ANTHROPIC_BASE_URL="http://localhost:8010"
export ANTHROPIC_API_KEY="dummy"

These configuration options and commands are documented in the repository’s quick-start section.

Starting the proxy

Once installed, the user does not need to navigate through directories or manually launch Python or Node.js files.

The following global commands are available:

spcvp

Starts the Python/FastAPI proxy.

spcvn

Starts the Node.js/Fastify proxy.

Then Claude Code can be launched through:

claudevllm

Or with permission bypass mode:

claudevllmd

The README specifies that these commands work across Windows, Linux, and macOS after installation.

Why the proxy is useful

Some Claude Code requests are not accepted directly by certain versions of vLLM’s Anthropic-compatible endpoint.

For example, Claude Code may include system or developer roles inside the messages array, while the endpoint expects message roles to be only user or assistant.

This can produce validation errors such as:

Input should be 'user' or 'assistant'

Claude Code may also include fields such as:

context_management
output_config
thinking
cache_control

The proxy normalizes these requests before forwarding them to vLLM.

It can:

  • Move system and developer messages into the top-level system field.
  • Convert system messages into tagged user messages when required.
  • Remove unsupported top-level request fields.
  • Remove unsupported properties from tool definitions.
  • Recursively remove cache_control.
  • Force a specific model name.
  • Forward an upstream API key.
  • Preserve streaming responses.
  • Connect to a vLLM server running locally or elsewhere on the network.

The two supported system-message modes are hoist, which is recommended in the README, and user, which converts instructions to messages tagged with <system-update>.

Example architecture

Windows / Linux / macOS client
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Claude Code                   β”‚
β”‚ claude-vllm proxy             β”‚
β”‚ Python or Node.js             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β”‚ LAN / VPN / Tailscale
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ NVIDIA system                 β”‚
β”‚ vLLM server                   β”‚
β”‚ Local coding model            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This makes it possible to run Claude Code on a regular Windows, Linux, or macOS computer while the model runs on a separate NVIDIA GPU workstation, server, or DGX system.

Health checks

The proxy also includes health endpoints:

curl http://localhost:8010/healthz

This verifies that the proxy is running.

curl http://localhost:8010/readyz

This verifies both the proxy and its connection to the upstream vLLM server.

Project scope

This is a compatibility proxy rather than a complete Anthropic-to-OpenAI translation layer.

It uses vLLM’s Anthropic-compatible interface and normalizes the portions of Claude Code requests that may otherwise cause validation errors.

The repository includes:

  • Windows, Linux, and macOS installers.
  • Python/FastAPI implementation.
  • Node.js/Fastify implementation.
  • Global startup commands.
  • Configurable request normalization.
  • Local and LAN vLLM support.
  • Health and readiness endpoints.
  • Environment-variable configuration.
  • MIT license.

Feedback, testing, issues, and pull requests are welcome, especially from users running:

  • NVIDIA DGX systems.
  • NVIDIA GPU workstations.
  • NVIDIA NGC vLLM containers.
  • Remote vLLM instances over LAN, VPN, or Tailscale.
  • Qwen, Nemotron, DeepSeek, and other coding models.
  • Claude Code clients on Windows, Linux, or macOS.

Repository: github.com/Nachetecabrera/claude-vllm

Why not just submit PRs to fix the missing or broken pieces of the already existing anthropic messages API?