One endpoint, any LLM backend

A lightweight, secure API proxy that aggregates multiple OpenAI and Anthropic-compatible backends behind a single endpoint. No database or complex dependencies—just a single binary and a YAML config.

What it's for

go-llm-proxy solves common LLM infrastructure challenges without the weight of full API management platforms.

🏠 Home Lab & Local Models

Run vLLM or llama-server on your local machine or home server, then expose it with a consistent OpenAI-compatible API for all your tools.

🔀 Multi-Provider Aggregation

Combine locally-hosted models with cloud subscriptions (OpenAI, Anthropic, MiniMax, Zhipu) behind one endpoint. Switch models by changing the request—no code changes.

🛠️ CLI Agent Integration

Connect coding assistants like Claude Code, Codex, Qwen Code, and OpenCode to any backend. The built-in config generator makes setup trivial.

📊 Usage Tracking

Log per-request metrics to SQLite. Understand who is using which models, token consumption, error rates, and latency—without external services.

Features

🔄

Protocol Translation

Automatically translates between OpenAI Responses API and Chat Completions. Backends that only support Chat Completions work with Codex and other Responses API clients.

🔑

API Key Management

Per-key model access control. Restrict users to specific models. Supports both Bearer tokens and x-api-key headers.

Hot Reload

Config reloads automatically when you save the YAML file. Add models or adjust keys without restarting—active connections drain gracefully.

📈

Usage Logging

SQLite-based logging with a web dashboard and CLI reports. Track per-user token consumption, model usage, and error rates.

🛠️

Config Generator

Built-in UI that generates ready-to-use configs for Claude Code, Codex, Qwen Code, and OpenCode. Select your models, get working configs instantly.

🔗

Dual Protocol Support

Handles both OpenAI and Anthropic APIs. Use /v1/... or /anthropic/... prefix to explicitly route to Anthropic-type backends.

Coding Assistants

These tools work with go-llm-proxy out of the box. The config generator creates ready-to-use configurations for each.

Claude Code
OpenAI Codex
Qwen Code
OpenCode
OpenAI Python SDK
Anthropic Python SDK

Supported Backends

vLLM
llama-server (llama.cpp)
Ollama
OpenAI API
Anthropic API
MiniMax
Zhipu (GLM)

Plus any other OpenAI/Anthropic-compatible endpoint.

Config Generator

Built right into the proxy is an interactive config generator. Enable it with --serve-config-generator, point your browser at http://localhost:8080/, and it'll read your running config to show available models. Select the coding assistant you want to configure, pick your models, and it generates:

  • Configuration files (settings.json, config.toml, opencode.json)
  • Downloadable startup scripts (.sh, .bat, .ps1)
  • Environment setup with API keys
  • MCP server integration (e.g., Tavily for web search)
go-llm-proxy — Config Generator
Available Models
Model Protocol Data Safety
MiniMax-M2.5OpenAISafe for data
qwen-3.5OpenAISafe for data
glm-5.1OpenAI3rd party
claude-sonnet-4Anthropic3rd party
Coding Assistant
Proxy API Key
••••••••••••••••
Model
Reasoning Effort
Context Window
Detected: 192K tokens
Generate Config
config.toml
model = "MiniMax-M2.5" model_provider = "go-llm-proxy" model_reasoning_effort = "medium" model_context_window = 196608 [model_providers.go-llm-proxy] name = "Go-LLM-Proxy" base_url = "https://llm.example.com/v1" wire_api = "responses" experimental_bearer_token = "sk-••••••"

Usage Logging

Enable per-request logging with --log-metrics or log_metrics: true in your config. Every request writes to a local SQLite database.

Usage Dashboard — /usage
Requests
12,847
Tokens
48.2M
Users
5
Error Rate
0.3%
Daily Requests
Model Requests Users Tokens Avg Latency
MiniMax-M2.58,234432.1M1,523ms
qwen-3.53,412312.8M2,341ms
glm-5.11,20123.3M856ms
$ ./go-llm-proxy -usage-report -report-days 7
DATE USER REQUESTS INPUT TOK OUTPUT TOK TOTAL TOK ---- ---- -------- --------- ---------- --------- 2026-04-02 admin 342 1,245,000 312,000 1,557,000 2026-04-02 derek 128 489,000 122,000 611,000 2026-04-01 admin 456 1,890,000 478,000 2,368,000 2026-04-01 derek 89 312,000 78,000 390,000 === User Summary === USER REQUESTS TOTAL TOK DAYS LAST SEEN ---- -------- --------- ---- --------- admin 4,892 22,801,000 7 2026-04-02 derek 1,234 4,515,000 6 2026-04-02
$
📊

Web Dashboard

Password-protected UI at /usage with daily breakdowns, per-user stats, model metrics, and error rates.

📋

CLI Reports

Run --usage-report or --model-report to get terminal-based summaries without starting the server.

🔒

Privacy-First

API keys are never stored—only the first 16 hex characters of their SHA-256 hash. Enough to identify users, not reversible.

Security

go-llm-proxy is designed to be run behind a reverse proxy (like nginx) in production. It includes several hardening measures:

🔒 Path Allowlisting

Only documented endpoints are proxied. Arbitrary backend URLs cannot be accessed.

🛡️ SSRF Prevention

HTTP client does not follow redirects. Backend URLs are validated on config load (scheme, host, no embedded credentials).

⏱️ Request Limits

Request bodies capped at 50 MB, response bodies at 100 MB. Timeouts protect against slowloris attacks.

🔑 Timing-Safe Comparison

API keys are compared in constant time to prevent timing attacks.

🚫 Panic Recovery

Internal panics are caught and return generic 500 errors—stack traces never leak to clients.

📝 Config Page Note

The config generator (disabled by default) exposes model names but not backend URLs or API keys. Enable only behind a trusted proxy.

Deploy behind a reverse proxy

For public-facing deployments, always put go-llm-proxy behind nginx or another TLS-terminating proxy. This provides additional rate limiting, request filtering, and authentication layers.

Dependencies

go-llm-proxy has minimal external dependencies:

  • fsnotify — for config file watching
  • yaml — for config parsing
  • modernc.org/sqlite — embedded SQLite (no server required)

That's it. No Redis, no external database, no complex runtime. One static binary, one config file.