One endpoint, any LLM backend
A lightweight, secure API proxy that aggregates multiple OpenAI and Anthropic-compatible backends behind a single endpoint. No database or complex dependencies—just a single binary and a YAML config.
What it's for
go-llm-proxy solves common LLM infrastructure challenges without the weight of full API management platforms.
🏠 Home Lab & Local Models
Run vLLM or llama-server on your local machine or home server, then expose it with a consistent OpenAI-compatible API for all your tools.
🔀 Multi-Provider Aggregation
Combine locally-hosted models with cloud subscriptions (OpenAI, Anthropic, MiniMax, Zhipu) behind one endpoint. Switch models by changing the request—no code changes.
🛠️ CLI Agent Integration
Connect coding assistants like Claude Code, Codex, Qwen Code, and OpenCode to any backend. The built-in config generator makes setup trivial.
📊 Usage Tracking
Log per-request metrics to SQLite. Understand who is using which models, token consumption, error rates, and latency—without external services.
Features
Protocol Translation
Automatically translates between OpenAI Responses API and Chat Completions. Backends that only support Chat Completions work with Codex and other Responses API clients.
API Key Management
Per-key model access control. Restrict users to specific models. Supports both Bearer tokens and x-api-key headers.
Hot Reload
Config reloads automatically when you save the YAML file. Add models or adjust keys without restarting—active connections drain gracefully.
Usage Logging
SQLite-based logging with a web dashboard and CLI reports. Track per-user token consumption, model usage, and error rates.
Config Generator
Built-in UI that generates ready-to-use configs for Claude Code, Codex, Qwen Code, and OpenCode. Select your models, get working configs instantly.
Dual Protocol Support
Handles both OpenAI and Anthropic APIs. Use /v1/... or /anthropic/... prefix to explicitly route to Anthropic-type backends.
Coding Assistants
These tools work with go-llm-proxy out of the box. The config generator creates ready-to-use configurations for each.
Supported Backends
Plus any other OpenAI/Anthropic-compatible endpoint.
Config Generator
Built right into the proxy is an interactive config generator. Enable it with --serve-config-generator, point your browser at http://localhost:8080/, and it'll read your running config to show available models. Select the coding assistant you want to configure, pick your models, and it generates:
- Configuration files (settings.json, config.toml, opencode.json)
- Downloadable startup scripts (.sh, .bat, .ps1)
- Environment setup with API keys
- MCP server integration (e.g., Tavily for web search)
| Model | Protocol | Data Safety |
|---|---|---|
| MiniMax-M2.5 | OpenAI | Safe for data |
| qwen-3.5 | OpenAI | Safe for data |
| glm-5.1 | OpenAI | 3rd party |
| claude-sonnet-4 | Anthropic | 3rd party |
Usage Logging
Enable per-request logging with --log-metrics or log_metrics: true in your config. Every request writes to a local SQLite database.
| Model | Requests | Users | Tokens | Avg Latency |
|---|---|---|---|---|
| MiniMax-M2.5 | 8,234 | 4 | 32.1M | 1,523ms |
| qwen-3.5 | 3,412 | 3 | 12.8M | 2,341ms |
| glm-5.1 | 1,201 | 2 | 3.3M | 856ms |
Web Dashboard
Password-protected UI at /usage with daily breakdowns, per-user stats, model metrics, and error rates.
CLI Reports
Run --usage-report or --model-report to get terminal-based summaries without starting the server.
Privacy-First
API keys are never stored—only the first 16 hex characters of their SHA-256 hash. Enough to identify users, not reversible.
Security
go-llm-proxy is designed to be run behind a reverse proxy (like nginx) in production. It includes several hardening measures:
🔒 Path Allowlisting
Only documented endpoints are proxied. Arbitrary backend URLs cannot be accessed.
🛡️ SSRF Prevention
HTTP client does not follow redirects. Backend URLs are validated on config load (scheme, host, no embedded credentials).
⏱️ Request Limits
Request bodies capped at 50 MB, response bodies at 100 MB. Timeouts protect against slowloris attacks.
🔑 Timing-Safe Comparison
API keys are compared in constant time to prevent timing attacks.
🚫 Panic Recovery
Internal panics are caught and return generic 500 errors—stack traces never leak to clients.
📝 Config Page Note
The config generator (disabled by default) exposes model names but not backend URLs or API keys. Enable only behind a trusted proxy.
For public-facing deployments, always put go-llm-proxy behind nginx or another TLS-terminating proxy. This provides additional rate limiting, request filtering, and authentication layers.
Dependencies
go-llm-proxy has minimal external dependencies:
- fsnotify — for config file watching
- yaml — for config parsing
- modernc.org/sqlite — embedded SQLite (no server required)
That's it. No Redis, no external database, no complex runtime. One static binary, one config file.