Connect coding agents to any model

A protocol-translating proxy that connects Claude Code, Codex, OpenCode, and Qwen Code to local and upstream models, injecting the tools you expect.

MIT License · Open Source · Free Forever
Coding agents
Claude Code Codex CLI OpenCode Qwen Code OpenClaw Your App
go-llm-proxy Web Search
Local backends
vLLM llama-server Ollama
Cloud backends (API key)
OpenAI Anthropic MiniMax Zhipu (GLM)

Connects agents to local models

Claude Code speaks Anthropic. Codex speaks Responses API. Your vLLM box speaks Chat Completions. The proxy translates automatically.

Multiplexes models

Route multiple models across multiple backends behind one endpoint. Name rewriting, per-model timeouts, per-key access control.

Adds tools backends lack

Web search via Tavily or Brave Search, image description, PDF text extraction, OCR for scanned documents — executed at the proxy, injected transparently.

Compatibility matrix

What works with each coding agent through the proxy.

Claude Code Codex CLI OpenCode Qwen Code
Protocol
Native API Anthropic Messages OpenAI Responses Chat Completions Chat Completions
Translation needed auto-translated auto-translated passthrough passthrough
Core features
Text + streaming
Tool calling
Multi-turn tool loops
Reasoning display N/A N/A
Server-side features
Token usage tracking
Context compaction N/A N/A
Token counting endpoint N/A N/A N/A
Prompt caching passthrough N/A N/A N/A
Extended thinking N/A N/A
Proxy-side processing (details)
Web search (Tavily / Brave) ✓ proxy ✓ proxy ✓ MCP ✓ MCP
Image description ✓ vision ✓ vision ✓ vision ✓ vision
PDF text extraction ✓ proxy client-side
Scanned PDF / OCR ✓ OCR model ✓ OCR model
Conversation compaction N/A N/A N/A
Usage logging & reports
Configuration
Model slots Sonnet / Opus / Haiku Single model Build / Plan agents Multi-select
Config output settings.json + script config.toml + script JSON config settings.json
Setup guide Claude Code Codex CLI OpenCode Qwen Code

Web search intercepts server-side search tool calls, executes them via Tavily, and injects results back into the conversation. Image description routes user-attached images to a vision model; tool output images (PDF pages, screenshots) are routed to a dedicated OCR model for text extraction. PDF text extraction runs locally in pure Go; scanned PDFs fall back to the OCR model. All results are cached by content hash for instant follow-up turns. See the pipeline documentation for full details.

Quick start

Create a config.yaml with your models and keys:

listen: ":8080" models: # Your coding model (vLLM, llama-server, Ollama, etc.) - name: my-model backend: http://192.168.1.10:8000/v1 responses_mode: translate # recommended for vLLM backends with Codex # Cloud API — native Anthropic passthrough - name: MiniMax-M2.5 backend: https://api.minimax.io/anthropic api_key: your-minimax-key type: anthropic # Cloud API — auto-translated from any protocol - name: glm-5.1 backend: https://api.z.ai/api/coding/paas/v4 api_key: your-zhipu-key # Vision model — describes images for text-only backends - name: Qwen3-VL-8B backend: http://192.168.1.10:8001/v1 supports_vision: true # OCR model — fast text extraction from documents and scanned PDFs - name: paddleOCR backend: http://192.168.1.10:8002/v1 supports_vision: true # Processing pipeline — handles images, PDFs, and web search transparently processors: vision: Qwen3-VL-8B # any vision-capable model from above ocr: paddleOCR # PaddleOCR-VL-1.5 (0.9B) — fast, accurate web_search_key: tvly-... # Tavily or Brave Search key (auto-detected) keys: - key: sk-your-secret-key name: admin

Recommended processor models

VisionQwen3-VL-8B — best quality/speed balance for image description. Handles charts, screenshots, diagrams.
OCRPaddleOCR-VL-1.5 (0.9B) — purpose-built for document parsing. 94.5% accuracy, 109 languages, ~2s/page. Tiny VRAM footprint.
Web searchTavily (free: 1,000 req/month) or Brave Search (free: $5/month credit). Add your API key to processors.web_search_key — provider is auto-detected from the key prefix.

Run it:

# Binary (recommended) ./go-llm-proxy -config config.yaml # Docker (limited testing — ongoing) docker compose -f docker/docker-compose.yml up -d

Point any OpenAI or Anthropic-compatible client at http://localhost:8080/v1. That's it.

The built-in config generator at GET / creates ready-to-use configs for each coding agent. Enable it with -serve-config-generator or serve_config_generator: true in config.

Features

Protocol translationAnthropic Messages API ↔ Chat Completions. OpenAI Responses API ↔ Chat Completions. Auto-detected per backend.
Model multiplexingMultiple models across multiple backends. Name rewriting, per-model timeouts, per-key model restrictions.
Image descriptionImages sent to text-only models are described by a vision-capable model and replaced with text. Concurrent processing (up to 5 in parallel), cached by content hash for instant follow-up turns.
PDF processingText extraction for native PDFs. OCR via dedicated fast model for scanned documents and page images. Results cached across turns.
Web searchServer-side search tools (Claude Code, Codex) executed at the proxy via Tavily or Brave Search (auto-detected from key prefix). Results displayed in client UI. Streaming and non-streaming modes.
Context trackingToken usage from backends reported to clients for context window tracking. Token counting endpoint planned for per-section breakdowns.
API key managementIssue proxy keys with per-key model restrictions. Backend credentials stay on the server.
Usage monitoringPer-request logging to SQLite. Token counts, latency, per-user breakdowns. Web dashboard and CLI reports.
Config generatorBuilt-in web UI creates ready-to-use configs for Claude Code, Codex, OpenCode, and Qwen Code.
Hot reloadConfig reloads on file save or SIGHUP. Add models or rotate keys without restarting.
SecurityConstant-time auth, IP rate limiting, SSRF protection, sanitized error responses, path allowlisting.

Supported backends

Anything that speaks OpenAI Chat Completions or Anthropic Messages protocol works. Tested with:

vLLM llama-server Ollama OpenAI API Anthropic API MiniMax Zhipu (GLM)

How it works

Client request (any protocol) → Protocol handler parses request → Translate to Chat Completions (if needed) → Pipeline: describe images, extract PDF text, OCR page images, inject search tool → Send to backend → Pipeline: execute web search if called, re-send with results → Translate response back to client protocol → Stream to client

The pipeline is optional. Without processors configured, the proxy just translates and routes. With processors enabled, images, PDFs, and search work transparently on any backend.

Documentation

Configuration referenceAll config fields, modes, and examples
Claude Code guideMessages API translation, tool calling, web search
Codex CLI guideResponses API translation, compaction, context windows
Processing pipelineImage description, PDF OCR, web search — per-client behavior
Docker deploymentOne config file, one command
SecurityHardening, rate limiting, deployment recommendations