llmproxy

llmproxy is a lightweight Flask-based HTTP proxy that presents a unified OpenAI-compatible API surface in front of multiple large language model providers. Clients that speak the OpenAI API, including LangChain, LiteLLM, Open WebUI, and Cursor, connect to llmproxy without modification; the proxy routes each request to the correct upstream based on a provider prefix embedded in the model name string. For example, a client requesting the model openrouter/anthropic/claude-3.5-sonnet will have its request forwarded to OpenRouter with the provider prefix stripped, while a request for ollama/llama3 will be routed to a local Ollama instance.

The server hot-reloads its configuration on each request via a modification-time cache, so provider credentials and model filters can be updated without a restart. When gunicorn is installed, the server uses it automatically with threaded workers; otherwise it falls back to the Flask development server, which is sufficient for local use. The /v1/models endpoint queries all configured providers concurrently via a thread pool, and an unreachable provider is logged as a warning and omitted from the aggregate response rather than causing an overall failure. Streaming responses are relayed as raw SSE byte streams via stream_with_context, preserving upstream chunk boundaries.

A standalone test client (llmproxy_test_client.py) exercises all endpoints, requiring only the requests package, and reports pass/fail/skip results by test suite across health, error-handling, model listing, chat completion, streaming, embeddings, and optional OpenAI SDK compatibility tests. A Docker image and docker-compose configuration are also provided for containerized deployment; configuration is stored in a named Docker volume and shared between setup and server containers.

The package is hosted on GitHub at:

llmproxy

Quick Start

Install dependencies and run the interactive setup wizard:

pip install flask requests
python run.py --setup
python run.py

The server binds to 0.0.0.0:8080 by default. Providers, API keys, and optional model filters are stored in ~/.config/llmproxy/config.json.

Model Naming Convention

All models exposed by llmproxy follow the pattern <provider_name>/<upstream_model_id>. The proxy strips the leading provider prefix before forwarding the request to the upstream base URL.

Proxy model stringProviderUpstream model
openrouter/anthropic/claude-3.5-sonnetopenrouteranthropic/claude-3.5-sonnet
openai/gpt-4oopenaigpt-4o
ollama/llama3ollamallama3

API Endpoints

All endpoints mirror the OpenAI API.

MethodPathDescription
GET/healthHealth check; returns provider list
GET/v1/modelsAggregate model list from all providers
POST/v1/chat/completionsChat completions (streaming supported)
POST/v1/completionsLegacy text completions
POST/v1/embeddingsEmbeddings
*/v1/<anything>Pass-through to upstream

Client Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-used",
)

response = client.chat.completions.create(
    model="openrouter/anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Hello!"}],
)