One endpoint, any LLM backend

A lightweight, secure API proxy that aggregates multiple OpenAI and Anthropic-compatible backends behind a single endpoint. No database or complex dependencies—just a single binary and a YAML config.

Download Binary View on GitHub

What it's for

go-llm-proxy solves common LLM infrastructure challenges without the weight of full API management platforms.

🏠 Home Lab & Local Models

Run vLLM or llama-server on your local machine or home server, then expose it with a consistent OpenAI-compatible API for all your tools.

🔀 Multi-Provider Aggregation

Combine locally-hosted models with cloud subscriptions (OpenAI, Anthropic, MiniMax, Zhipu) behind one endpoint. Switch models by changing the request—no code changes.

🛠️ CLI Agent Integration

Connect coding assistants like Claude Code, Codex, Qwen Code, and OpenCode to any backend. The built-in config generator makes setup trivial.

📊 Usage Tracking

Log per-request metrics to SQLite. Understand who is using which models, token consumption, error rates, and latency—without external services.

Features

🔄

Protocol Translation

Automatically translates between OpenAI Responses API and Chat Completions. Backends that only support Chat Completions work with Codex and other Responses API clients.

🔑

API Key Management

Per-key model access control. Restrict users to specific models. Supports both Bearer tokens and x-api-key headers.

⚡

Hot Reload

Config reloads automatically when you save the YAML file. Add models or adjust keys without restarting—active connections drain gracefully.

📈

Usage Logging

SQLite-based logging with a web dashboard and CLI reports. Track per-user token consumption, model usage, and error rates.

🛠️

Config Generator

Built-in UI that generates ready-to-use configs for Claude Code, Codex, Qwen Code, and OpenCode. Select your models, get working configs instantly.

🔗

Dual Protocol Support

Handles both OpenAI and Anthropic APIs. Use /v1/... or /anthropic/... prefix to explicitly route to Anthropic-type backends.

Coding Assistants

These tools work with go-llm-proxy out of the box. The config generator creates ready-to-use configurations for each.

✓ Claude Code

✓ OpenAI Codex

✓ Qwen Code

✓ OpenCode

✓ OpenAI Python SDK

✓ Anthropic Python SDK

Supported Backends

✓ vLLM

✓ llama-server (llama.cpp)

✓ Ollama

✓ OpenAI API

✓ Anthropic API

✓ MiniMax

✓ Zhipu (GLM)

Plus any other OpenAI/Anthropic-compatible endpoint.

Config Generator

Built right into the proxy is an interactive config generator. Enable it with --serve-config-generator, point your browser at http://localhost:8080/, and it'll read your running config to show available models. Select the coding assistant you want to configure, pick your models, and it generates:

Configuration files (settings.json, config.toml, opencode.json)
Downloadable startup scripts (.sh, .bat, .ps1)
Environment setup with API keys
MCP server integration (e.g., Tavily for web search)

go-llm-proxy — Config Generator

Available Models

Model	Protocol	Data Safety
MiniMax-M2.5	OpenAI	Safe for data
qwen-3.5	OpenAI	Safe for data
glm-5.1	OpenAI	3rd party
claude-sonnet-4	Anthropic	3rd party

Coding Assistant

Proxy API Key

••••••••••••••••

Model

Reasoning Effort

Context Window

Detected: 192K tokens

Generate Config

config.toml

model = "MiniMax-M2.5"
model_provider = "go-llm-proxy"
model_reasoning_effort = "medium"
model_context_window = 196608

[model_providers.go-llm-proxy]
name = "Go-LLM-Proxy"
base_url = "https://llm.example.com/v1"
wire_api = "responses"
experimental_bearer_token = "sk-••••••"

Usage Logging

Enable per-request logging with --log-metrics or log_metrics: true in your config. Every request writes to a local SQLite database.

Usage Dashboard — /usage

Requests

12,847

Tokens

48.2M

Users

Error Rate

0.3%

Daily Requests

Model	Requests	Users	Tokens	Avg Latency
MiniMax-M2.5	8,234	4	32.1M	1,523ms
qwen-3.5	3,412	3	12.8M	2,341ms
glm-5.1	1,201	2	3.3M	856ms

$ ./go-llm-proxy -usage-report -report-days 7
DATE        USER     REQUESTS  INPUT TOK   OUTPUT TOK  TOTAL TOK
----        ----     --------  ---------   ----------  ---------
2026-04-02  admin    342       1,245,000   312,000     1,557,000
2026-04-02  derek    128       489,000     122,000     611,000
2026-04-01  admin    456       1,890,000   478,000     2,368,000
2026-04-01  derek    89        312,000     78,000      390,000

=== User Summary ===
USER     REQUESTS  TOTAL TOK    DAYS  LAST SEEN
----     --------  ---------    ----  ---------
admin    4,892     22,801,000   7     2026-04-02
derek    1,234     4,515,000    6     2026-04-02
$ 

📊

Web Dashboard

Password-protected UI at /usage with daily breakdowns, per-user stats, model metrics, and error rates.

📋

CLI Reports

Run --usage-report or --model-report to get terminal-based summaries without starting the server.

🔒

Privacy-First

API keys are never stored—only the first 16 hex characters of their SHA-256 hash. Enough to identify users, not reversible.

Security

go-llm-proxy is designed to be run behind a reverse proxy (like nginx) in production. It includes several hardening measures:

🔒 Path Allowlisting

Only documented endpoints are proxied. Arbitrary backend URLs cannot be accessed.

🛡️ SSRF Prevention

HTTP client does not follow redirects. Backend URLs are validated on config load (scheme, host, no embedded credentials).

⏱️ Request Limits

Request bodies capped at 50 MB, response bodies at 100 MB. Timeouts protect against slowloris attacks.

🔑 Timing-Safe Comparison

API keys are compared in constant time to prevent timing attacks.

🚫 Panic Recovery

Internal panics are caught and return generic 500 errors—stack traces never leak to clients.

📝 Config Page Note

The config generator (disabled by default) exposes model names but not backend URLs or API keys. Enable only behind a trusted proxy.

Deploy behind a reverse proxy

For public-facing deployments, always put go-llm-proxy behind nginx or another TLS-terminating proxy. This provides additional rate limiting, request filtering, and authentication layers.

Dependencies

go-llm-proxy has minimal external dependencies:

fsnotify — for config file watching
yaml — for config parsing
modernc.org/sqlite — embedded SQLite (no server required)

That's it. No Redis, no external database, no complex runtime. One static binary, one config file.