Connect coding agents to any model
A protocol-translating proxy that connects Claude Code, Codex, OpenCode, and Qwen Code to local and upstream models, injecting the tools you expect.
Connects agents to local models
Claude Code speaks Anthropic. Codex speaks Responses API. Your vLLM box speaks Chat Completions. The proxy translates automatically.
Multiplexes models
Route multiple models across multiple backends behind one endpoint. Name rewriting, per-model timeouts, per-key access control.
Adds tools backends lack
Web search via Tavily or Brave Search, image description, PDF text extraction, OCR for scanned documents — executed at the proxy, injected transparently.
Compatibility matrix
What works with each coding agent through the proxy.
| Claude Code | Codex CLI | OpenCode | Qwen Code | |
|---|---|---|---|---|
| Protocol | ||||
| Native API | Anthropic Messages | OpenAI Responses | Chat Completions | Chat Completions |
| Translation needed | auto-translated | auto-translated | passthrough | passthrough |
| Core features | ||||
| Text + streaming | ✓ | ✓ | ✓ | ✓ |
| Tool calling | ✓ | ✓ | ✓ | ✓ |
| Multi-turn tool loops | ✓ | ✓ | ✓ | ✓ |
| Reasoning display | ✓ | ✓ | N/A | N/A |
| Server-side features | ||||
| Token usage tracking | ✓ | ✓ | ✓ | ✓ |
| Context compaction | ✓ | ✓ | N/A | N/A |
| Token counting endpoint | ✗ | N/A | N/A | N/A |
| Prompt caching | passthrough | N/A | N/A | N/A |
| Extended thinking | ✓ | ✓ | N/A | N/A |
| Proxy-side processing (details) | ||||
| Web search (Tavily / Brave) | ✓ proxy | ✓ proxy | ✓ MCP | ✓ MCP |
| Image description | ✓ vision | ✓ vision | ✓ vision | ✓ vision |
| PDF text extraction | ✓ proxy | client-side | ✓ | ✓ |
| Scanned PDF / OCR | ✓ OCR model | ✓ OCR model | ✓ | ✓ |
| Conversation compaction | N/A | ✓ | N/A | N/A |
| Usage logging & reports | ✓ | ✓ | ✓ | ✓ |
| Configuration | ||||
| Model slots | Sonnet / Opus / Haiku | Single model | Build / Plan agents | Multi-select |
| Config output | settings.json + script | config.toml + script | JSON config | settings.json |
| Setup guide | Claude Code | Codex CLI | OpenCode | Qwen Code |
Web search intercepts server-side search tool calls, executes them via Tavily, and injects results back into the conversation. Image description routes user-attached images to a vision model; tool output images (PDF pages, screenshots) are routed to a dedicated OCR model for text extraction. PDF text extraction runs locally in pure Go; scanned PDFs fall back to the OCR model. All results are cached by content hash for instant follow-up turns. See the pipeline documentation for full details.
Quick start
Create a config.yaml with your models and keys:
Recommended processor models
| Vision | Qwen3-VL-8B — best quality/speed balance for image description. Handles charts, screenshots, diagrams. |
| OCR | PaddleOCR-VL-1.5 (0.9B) — purpose-built for document parsing. 94.5% accuracy, 109 languages, ~2s/page. Tiny VRAM footprint. |
| Web search | Tavily (free: 1,000 req/month) or Brave Search (free: $5/month credit). Add your API key to processors.web_search_key — provider is auto-detected from the key prefix. |
Run it:
Point any OpenAI or Anthropic-compatible client at http://localhost:8080/v1. That's it.
The built-in config generator at GET / creates ready-to-use configs for each coding agent. Enable it with -serve-config-generator or serve_config_generator: true in config.
Features
| Protocol translation | Anthropic Messages API ↔ Chat Completions. OpenAI Responses API ↔ Chat Completions. Auto-detected per backend. |
| Model multiplexing | Multiple models across multiple backends. Name rewriting, per-model timeouts, per-key model restrictions. |
| Image description | Images sent to text-only models are described by a vision-capable model and replaced with text. Concurrent processing (up to 5 in parallel), cached by content hash for instant follow-up turns. |
| PDF processing | Text extraction for native PDFs. OCR via dedicated fast model for scanned documents and page images. Results cached across turns. |
| Web search | Server-side search tools (Claude Code, Codex) executed at the proxy via Tavily or Brave Search (auto-detected from key prefix). Results displayed in client UI. Streaming and non-streaming modes. |
| Context tracking | Token usage from backends reported to clients for context window tracking. Token counting endpoint planned for per-section breakdowns. |
| API key management | Issue proxy keys with per-key model restrictions. Backend credentials stay on the server. |
| Usage monitoring | Per-request logging to SQLite. Token counts, latency, per-user breakdowns. Web dashboard and CLI reports. |
| Config generator | Built-in web UI creates ready-to-use configs for Claude Code, Codex, OpenCode, and Qwen Code. |
| Hot reload | Config reloads on file save or SIGHUP. Add models or rotate keys without restarting. |
| Security | Constant-time auth, IP rate limiting, SSRF protection, sanitized error responses, path allowlisting. |
Supported backends
Anything that speaks OpenAI Chat Completions or Anthropic Messages protocol works. Tested with:
How it works
The pipeline is optional. Without processors configured, the proxy just translates and routes. With processors enabled, images, PDFs, and search work transparently on any backend.
Documentation
| Configuration reference | All config fields, modes, and examples |
| Claude Code guide | Messages API translation, tool calling, web search |
| Codex CLI guide | Responses API translation, compaction, context windows |
| Processing pipeline | Image description, PDF OCR, web search — per-client behavior |
| Docker deployment | One config file, one command |
| Security | Hardening, rate limiting, deployment recommendations |