Some checks failed
CI / Check / macos-latest (push) Has been cancelled
CI / Check / ubuntu-latest (push) Has been cancelled
CI / Check / windows-latest (push) Has been cancelled
CI / Test / macos-latest (push) Has been cancelled
CI / Test / ubuntu-latest (push) Has been cancelled
CI / Test / windows-latest (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Format (push) Has been cancelled
CI / Security Audit (push) Has been cancelled
CI / Secrets Scan (push) Has been cancelled
CI / Install Script Smoke Test (push) Has been cancelled
1046 lines
32 KiB
Markdown
1046 lines
32 KiB
Markdown
# LLM Providers Guide
|
|
|
|
OpenFang ships with a comprehensive model catalog covering **3 native LLM drivers**, **20 providers**, **51 builtin models**, and **23 aliases**. Every provider uses one of three battle-tested drivers: the native **Anthropic** driver, the native **Gemini** driver, or the universal **OpenAI-compatible** driver. This guide is the single source of truth for configuring, selecting, and managing LLM providers in OpenFang.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Quick Setup](#quick-setup)
|
|
2. [Provider Reference](#provider-reference)
|
|
3. [Model Catalog](#model-catalog)
|
|
4. [Model Aliases](#model-aliases)
|
|
5. [Per-Agent Model Override](#per-agent-model-override)
|
|
6. [Model Routing](#model-routing)
|
|
7. [Cost Tracking](#cost-tracking)
|
|
8. [Fallback Providers](#fallback-providers)
|
|
9. [API Endpoints](#api-endpoints)
|
|
10. [Channel Commands](#channel-commands)
|
|
|
|
---
|
|
|
|
## Quick Setup
|
|
|
|
The fastest path from zero to running:
|
|
|
|
```bash
|
|
# Pick ONE provider — set its env var — done.
|
|
export GEMINI_API_KEY="your-key" # Free tier available
|
|
# OR
|
|
export GROQ_API_KEY="your-key" # Free tier available
|
|
# OR
|
|
export ANTHROPIC_API_KEY="your-key"
|
|
# OR
|
|
export OPENAI_API_KEY="your-key"
|
|
```
|
|
|
|
OpenFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.
|
|
|
|
For Gemini specifically, either `GEMINI_API_KEY` or `GOOGLE_API_KEY` will work.
|
|
|
|
---
|
|
|
|
## Provider Reference
|
|
|
|
### 1. Anthropic
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Anthropic |
|
|
| **Driver** | Native Anthropic (Messages API) |
|
|
| **Env Var** | `ANTHROPIC_API_KEY` |
|
|
| **Base URL** | `https://api.anthropic.com` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `x-api-key` header |
|
|
| **Models** | 3 |
|
|
|
|
**Available Models:**
|
|
- `claude-opus-4-20250514` (Frontier)
|
|
- `claude-sonnet-4-20250514` (Smart)
|
|
- `claude-haiku-4-5-20251001` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [console.anthropic.com](https://console.anthropic.com)
|
|
2. Create an API key under Settings > API Keys
|
|
3. `export ANTHROPIC_API_KEY="sk-ant-..."`
|
|
|
|
---
|
|
|
|
### 2. OpenAI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | OpenAI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `OPENAI_API_KEY` |
|
|
| **Base URL** | `https://api.openai.com/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 6 |
|
|
|
|
**Available Models:**
|
|
- `gpt-4.1` (Frontier)
|
|
- `gpt-4o` (Smart)
|
|
- `o3-mini` (Smart)
|
|
- `gpt-4.1-mini` (Balanced)
|
|
- `gpt-4o-mini` (Fast)
|
|
- `gpt-4.1-nano` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [platform.openai.com](https://platform.openai.com)
|
|
2. Create an API key under API Keys
|
|
3. `export OPENAI_API_KEY="sk-..."`
|
|
|
|
---
|
|
|
|
### 3. Google Gemini
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Google Gemini |
|
|
| **Driver** | Native Gemini (generateContent API) |
|
|
| **Env Var** | `GEMINI_API_KEY` (or `GOOGLE_API_KEY`) |
|
|
| **Base URL** | `https://generativelanguage.googleapis.com` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (generous free tier) |
|
|
| **Auth** | `x-goog-api-key` header |
|
|
| **Models** | 3 |
|
|
|
|
**Available Models:**
|
|
- `gemini-2.5-pro` (Frontier)
|
|
- `gemini-2.5-flash` (Smart)
|
|
- `gemini-2.0-flash` (Fast)
|
|
|
|
**Setup:**
|
|
1. Go to [aistudio.google.com](https://aistudio.google.com)
|
|
2. Get an API key (free tier included)
|
|
3. `export GEMINI_API_KEY="AIza..."` or `export GOOGLE_API_KEY="AIza..."`
|
|
|
|
**Notes:** The Gemini driver is a fully native implementation. It is not OpenAI-compatible. Model goes in the URL path, system prompt via `systemInstruction`, tools via `functionDeclarations`, streaming via `streamGenerateContent?alt=sse`.
|
|
|
|
---
|
|
|
|
### 4. DeepSeek
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | DeepSeek |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `DEEPSEEK_API_KEY` |
|
|
| **Base URL** | `https://api.deepseek.com/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `deepseek-chat` (Smart) -- DeepSeek V3
|
|
- `deepseek-reasoner` (Smart) -- DeepSeek R1, no tool support
|
|
|
|
**Setup:**
|
|
1. Sign up at [platform.deepseek.com](https://platform.deepseek.com)
|
|
2. Create an API key
|
|
3. `export DEEPSEEK_API_KEY="sk-..."`
|
|
|
|
---
|
|
|
|
### 5. Groq
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Groq |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `GROQ_API_KEY` |
|
|
| **Base URL** | `https://api.groq.com/openai/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (rate-limited) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 4 |
|
|
|
|
**Available Models:**
|
|
- `llama-3.3-70b-versatile` (Balanced)
|
|
- `mixtral-8x7b-32768` (Balanced)
|
|
- `llama-3.1-8b-instant` (Fast)
|
|
- `gemma2-9b-it` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [console.groq.com](https://console.groq.com)
|
|
2. Create an API key
|
|
3. `export GROQ_API_KEY="gsk_..."`
|
|
|
|
**Notes:** Groq runs open-source models on custom LPU hardware. Extremely fast inference. Free tier has rate limits but is very usable.
|
|
|
|
---
|
|
|
|
### 6. OpenRouter
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | OpenRouter |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `OPENROUTER_API_KEY` |
|
|
| **Base URL** | `https://openrouter.ai/api/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited credits for some models) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 3 |
|
|
|
|
**Available Models:**
|
|
- `openrouter/auto` (Smart) -- auto-selects best model
|
|
- `openrouter/optimus` (Balanced) -- cost-optimized
|
|
- `openrouter/nitro` (Fast) -- speed-optimized
|
|
|
|
**Setup:**
|
|
1. Sign up at [openrouter.ai](https://openrouter.ai)
|
|
2. Create an API key under Keys
|
|
3. `export OPENROUTER_API_KEY="sk-or-..."`
|
|
|
|
**Notes:** OpenRouter is a unified gateway to 200+ models from many providers. The three builtin entries are OpenRouter's smart-routing endpoints. You can also use any model ID from their catalog directly by specifying the full OpenRouter model path.
|
|
|
|
---
|
|
|
|
### 7. Mistral AI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Mistral AI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `MISTRAL_API_KEY` |
|
|
| **Base URL** | `https://api.mistral.ai/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 3 |
|
|
|
|
**Available Models:**
|
|
- `mistral-large-latest` (Smart)
|
|
- `codestral-latest` (Smart)
|
|
- `mistral-small-latest` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [console.mistral.ai](https://console.mistral.ai)
|
|
2. Create an API key
|
|
3. `export MISTRAL_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 8. Together AI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Together AI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `TOGETHER_API_KEY` |
|
|
| **Base URL** | `https://api.together.xyz/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited credits on signup) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 3 |
|
|
|
|
**Available Models:**
|
|
- `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo` (Frontier)
|
|
- `Qwen/Qwen2.5-72B-Instruct-Turbo` (Smart)
|
|
- `mistralai/Mixtral-8x22B-Instruct-v0.1` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [api.together.ai](https://api.together.ai)
|
|
2. Create an API key
|
|
3. `export TOGETHER_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 9. Fireworks AI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Fireworks AI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `FIREWORKS_API_KEY` |
|
|
| **Base URL** | `https://api.fireworks.ai/inference/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited credits on signup) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `accounts/fireworks/models/llama-v3p1-405b-instruct` (Frontier)
|
|
- `accounts/fireworks/models/mixtral-8x22b-instruct` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [fireworks.ai](https://fireworks.ai)
|
|
2. Create an API key
|
|
3. `export FIREWORKS_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 10. Ollama
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Ollama |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `OLLAMA_API_KEY` (not required) |
|
|
| **Base URL** | `http://localhost:11434/v1` |
|
|
| **Key Required** | **No** |
|
|
| **Free Tier** | Free (local) |
|
|
| **Auth** | None (local) |
|
|
| **Models** | 3 builtin + auto-discovered |
|
|
|
|
**Available Models (builtin):**
|
|
- `llama3.2` (Local)
|
|
- `mistral:latest` (Local)
|
|
- `phi3` (Local)
|
|
|
|
**Setup:**
|
|
1. Install Ollama from [ollama.com](https://ollama.com)
|
|
2. Pull a model: `ollama pull llama3.2`
|
|
3. Start the server: `ollama serve`
|
|
4. No env var needed -- Ollama is always available
|
|
|
|
**Notes:** OpenFang auto-discovers models from a running Ollama instance and merges them into the catalog with `Local` tier and zero cost. Any model you pull becomes usable immediately.
|
|
|
|
---
|
|
|
|
### 11. vLLM
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | vLLM |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `VLLM_API_KEY` (not required) |
|
|
| **Base URL** | `http://localhost:8000/v1` |
|
|
| **Key Required** | **No** |
|
|
| **Free Tier** | Free (self-hosted) |
|
|
| **Auth** | None (local) |
|
|
| **Models** | 1 builtin + auto-discovered |
|
|
|
|
**Available Models (builtin):**
|
|
- `vllm-local` (Local)
|
|
|
|
**Setup:**
|
|
1. Install vLLM: `pip install vllm`
|
|
2. Start the server: `python -m vllm.entrypoints.openai.api_server --model <model-name>`
|
|
3. No env var needed
|
|
|
|
---
|
|
|
|
### 12. LM Studio
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | LM Studio |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `LMSTUDIO_API_KEY` (not required) |
|
|
| **Base URL** | `http://localhost:1234/v1` |
|
|
| **Key Required** | **No** |
|
|
| **Free Tier** | Free (local) |
|
|
| **Auth** | None (local) |
|
|
| **Models** | 1 builtin + auto-discovered |
|
|
|
|
**Available Models (builtin):**
|
|
- `lmstudio-local` (Local)
|
|
|
|
**Setup:**
|
|
1. Download LM Studio from [lmstudio.ai](https://lmstudio.ai)
|
|
2. Download a model from the built-in model browser
|
|
3. Start the local server from the "Local Server" tab
|
|
4. No env var needed
|
|
|
|
---
|
|
|
|
### 13. Perplexity AI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Perplexity AI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `PERPLEXITY_API_KEY` |
|
|
| **Base URL** | `https://api.perplexity.ai` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `sonar-pro` (Smart) -- online search-augmented
|
|
- `sonar` (Balanced) -- online search-augmented
|
|
|
|
**Setup:**
|
|
1. Sign up at [perplexity.ai](https://www.perplexity.ai)
|
|
2. Go to API settings and generate a key
|
|
3. `export PERPLEXITY_API_KEY="pplx-..."`
|
|
|
|
**Notes:** Perplexity models have built-in web search. They do not support tool use.
|
|
|
|
---
|
|
|
|
### 14. Cohere
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Cohere |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `COHERE_API_KEY` |
|
|
| **Base URL** | `https://api.cohere.com/v2` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (rate-limited trial) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `command-r-plus` (Smart)
|
|
- `command-r` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [dashboard.cohere.com](https://dashboard.cohere.com)
|
|
2. Create an API key
|
|
3. `export COHERE_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 15. AI21 Labs
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | AI21 Labs |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `AI21_API_KEY` |
|
|
| **Base URL** | `https://api.ai21.com/studio/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited credits) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 1 |
|
|
|
|
**Available Models:**
|
|
- `jamba-1.5-large` (Smart)
|
|
|
|
**Setup:**
|
|
1. Sign up at [studio.ai21.com](https://studio.ai21.com)
|
|
2. Create an API key
|
|
3. `export AI21_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 16. Cerebras
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Cerebras |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `CEREBRAS_API_KEY` |
|
|
| **Base URL** | `https://api.cerebras.ai/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (generous free tier) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `cerebras/llama3.3-70b` (Balanced)
|
|
- `cerebras/llama3.1-8b` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [cloud.cerebras.ai](https://cloud.cerebras.ai)
|
|
2. Create an API key
|
|
3. `export CEREBRAS_API_KEY="..."`
|
|
|
|
**Notes:** Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap ($0.06/M tokens for both input and output on the 70B model).
|
|
|
|
---
|
|
|
|
### 17. SambaNova
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | SambaNova |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `SAMBANOVA_API_KEY` |
|
|
| **Base URL** | `https://api.sambanova.ai/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited credits) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 1 |
|
|
|
|
**Available Models:**
|
|
- `sambanova/llama-3.3-70b` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [cloud.sambanova.ai](https://cloud.sambanova.ai)
|
|
2. Create an API key
|
|
3. `export SAMBANOVA_API_KEY="..."`
|
|
|
|
---
|
|
|
|
### 18. Hugging Face
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Hugging Face |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `HF_API_KEY` |
|
|
| **Base URL** | `https://api-inference.huggingface.co/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (rate-limited) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 1 |
|
|
|
|
**Available Models:**
|
|
- `hf/meta-llama/Llama-3.3-70B-Instruct` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [huggingface.co](https://huggingface.co)
|
|
2. Create a token under Settings > Access Tokens
|
|
3. `export HF_API_KEY="hf_..."`
|
|
|
|
---
|
|
|
|
### 19. xAI
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | xAI |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `XAI_API_KEY` |
|
|
| **Base URL** | `https://api.x.ai/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | Yes (limited free credits) |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 2 |
|
|
|
|
**Available Models:**
|
|
- `grok-2` (Smart) -- supports vision
|
|
- `grok-2-mini` (Fast)
|
|
|
|
**Setup:**
|
|
1. Sign up at [console.x.ai](https://console.x.ai)
|
|
2. Create an API key
|
|
3. `export XAI_API_KEY="xai-..."`
|
|
|
|
---
|
|
|
|
### 20. Replicate
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Display Name** | Replicate |
|
|
| **Driver** | OpenAI-compatible |
|
|
| **Env Var** | `REPLICATE_API_TOKEN` |
|
|
| **Base URL** | `https://api.replicate.com/v1` |
|
|
| **Key Required** | Yes |
|
|
| **Free Tier** | No |
|
|
| **Auth** | `Authorization: Bearer` header |
|
|
| **Models** | 1 |
|
|
|
|
**Available Models:**
|
|
- `replicate/meta-llama-3.3-70b-instruct` (Balanced)
|
|
|
|
**Setup:**
|
|
1. Sign up at [replicate.com](https://replicate.com)
|
|
2. Go to Account > API Tokens
|
|
3. `export REPLICATE_API_TOKEN="r8_..."`
|
|
|
|
---
|
|
|
|
## Model Catalog
|
|
|
|
The complete catalog of all 51 builtin models, sorted by provider. Pricing is per million tokens.
|
|
|
|
| # | Model ID | Display Name | Provider | Tier | Context Window | Max Output | Input $/M | Output $/M | Tools | Vision |
|
|
|---|----------|-------------|----------|------|---------------|------------|-----------|------------|-------|--------|
|
|
| 1 | `claude-opus-4-20250514` | Claude Opus 4 | anthropic | Frontier | 200,000 | 32,000 | $15.00 | $75.00 | Yes | Yes |
|
|
| 2 | `claude-sonnet-4-20250514` | Claude Sonnet 4 | anthropic | Smart | 200,000 | 64,000 | $3.00 | $15.00 | Yes | Yes |
|
|
| 3 | `claude-haiku-4-5-20251001` | Claude Haiku 4.5 | anthropic | Fast | 200,000 | 8,192 | $0.25 | $1.25 | Yes | Yes |
|
|
| 4 | `gpt-4.1` | GPT-4.1 | openai | Frontier | 1,047,576 | 32,768 | $2.00 | $8.00 | Yes | Yes |
|
|
| 5 | `gpt-4o` | GPT-4o | openai | Smart | 128,000 | 16,384 | $2.50 | $10.00 | Yes | Yes |
|
|
| 6 | `o3-mini` | o3-mini | openai | Smart | 200,000 | 100,000 | $1.10 | $4.40 | Yes | No |
|
|
| 7 | `gpt-4.1-mini` | GPT-4.1 Mini | openai | Balanced | 1,047,576 | 32,768 | $0.40 | $1.60 | Yes | Yes |
|
|
| 8 | `gpt-4o-mini` | GPT-4o Mini | openai | Fast | 128,000 | 16,384 | $0.15 | $0.60 | Yes | Yes |
|
|
| 9 | `gpt-4.1-nano` | GPT-4.1 Nano | openai | Fast | 1,047,576 | 32,768 | $0.10 | $0.40 | Yes | No |
|
|
| 10 | `gemini-2.5-pro` | Gemini 2.5 Pro | gemini | Frontier | 1,048,576 | 65,536 | $1.25 | $10.00 | Yes | Yes |
|
|
| 11 | `gemini-2.5-flash` | Gemini 2.5 Flash | gemini | Smart | 1,048,576 | 65,536 | $0.15 | $0.60 | Yes | Yes |
|
|
| 12 | `gemini-2.0-flash` | Gemini 2.0 Flash | gemini | Fast | 1,048,576 | 8,192 | $0.10 | $0.40 | Yes | Yes |
|
|
| 13 | `deepseek-chat` | DeepSeek V3 | deepseek | Smart | 64,000 | 8,192 | $0.27 | $1.10 | Yes | No |
|
|
| 14 | `deepseek-reasoner` | DeepSeek R1 | deepseek | Smart | 64,000 | 8,192 | $0.55 | $2.19 | No | No |
|
|
| 15 | `llama-3.3-70b-versatile` | Llama 3.3 70B | groq | Balanced | 128,000 | 32,768 | $0.059 | $0.079 | Yes | No |
|
|
| 16 | `mixtral-8x7b-32768` | Mixtral 8x7B | groq | Balanced | 32,768 | 4,096 | $0.024 | $0.024 | Yes | No |
|
|
| 17 | `llama-3.1-8b-instant` | Llama 3.1 8B | groq | Fast | 128,000 | 8,192 | $0.05 | $0.08 | Yes | No |
|
|
| 18 | `gemma2-9b-it` | Gemma 2 9B | groq | Fast | 8,192 | 4,096 | $0.02 | $0.02 | No | No |
|
|
| 19 | `openrouter/auto` | OpenRouter Auto | openrouter | Smart | 200,000 | 32,000 | $1.00 | $3.00 | Yes | Yes |
|
|
| 20 | `openrouter/optimus` | OpenRouter Optimus | openrouter | Balanced | 200,000 | 32,000 | $0.50 | $1.50 | Yes | No |
|
|
| 21 | `openrouter/nitro` | OpenRouter Nitro | openrouter | Fast | 128,000 | 16,000 | $0.20 | $0.60 | Yes | No |
|
|
| 22 | `mistral-large-latest` | Mistral Large | mistral | Smart | 128,000 | 8,192 | $2.00 | $6.00 | Yes | No |
|
|
| 23 | `codestral-latest` | Codestral | mistral | Smart | 32,000 | 8,192 | $0.30 | $0.90 | Yes | No |
|
|
| 24 | `mistral-small-latest` | Mistral Small | mistral | Fast | 128,000 | 8,192 | $0.10 | $0.30 | Yes | No |
|
|
| 25 | `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo` | Llama 3.1 405B (Together) | together | Frontier | 130,000 | 4,096 | $3.50 | $3.50 | Yes | No |
|
|
| 26 | `Qwen/Qwen2.5-72B-Instruct-Turbo` | Qwen 2.5 72B (Together) | together | Smart | 32,768 | 4,096 | $0.20 | $0.60 | Yes | No |
|
|
| 27 | `mistralai/Mixtral-8x22B-Instruct-v0.1` | Mixtral 8x22B (Together) | together | Balanced | 65,536 | 4,096 | $0.60 | $0.60 | Yes | No |
|
|
| 28 | `accounts/fireworks/models/llama-v3p1-405b-instruct` | Llama 3.1 405B (Fireworks) | fireworks | Frontier | 131,072 | 16,384 | $3.00 | $3.00 | Yes | No |
|
|
| 29 | `accounts/fireworks/models/mixtral-8x22b-instruct` | Mixtral 8x22B (Fireworks) | fireworks | Balanced | 65,536 | 4,096 | $0.90 | $0.90 | Yes | No |
|
|
| 30 | `llama3.2` | Llama 3.2 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | Yes | No |
|
|
| 31 | `mistral:latest` | Mistral (Ollama) | ollama | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
|
|
| 32 | `phi3` | Phi-3 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | No | No |
|
|
| 33 | `vllm-local` | vLLM Local Model | vllm | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
|
|
| 34 | `lmstudio-local` | LM Studio Local Model | lmstudio | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
|
|
| 35 | `sonar-pro` | Sonar Pro | perplexity | Smart | 200,000 | 8,192 | $3.00 | $15.00 | No | No |
|
|
| 36 | `sonar` | Sonar | perplexity | Balanced | 128,000 | 8,192 | $1.00 | $5.00 | No | No |
|
|
| 37 | `command-r-plus` | Command R+ | cohere | Smart | 128,000 | 4,096 | $2.50 | $10.00 | Yes | No |
|
|
| 38 | `command-r` | Command R | cohere | Balanced | 128,000 | 4,096 | $0.15 | $0.60 | Yes | No |
|
|
| 39 | `jamba-1.5-large` | Jamba 1.5 Large | ai21 | Smart | 256,000 | 4,096 | $2.00 | $8.00 | Yes | No |
|
|
| 40 | `cerebras/llama3.3-70b` | Llama 3.3 70B (Cerebras) | cerebras | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
|
|
| 41 | `cerebras/llama3.1-8b` | Llama 3.1 8B (Cerebras) | cerebras | Fast | 128,000 | 8,192 | $0.01 | $0.01 | Yes | No |
|
|
| 42 | `sambanova/llama-3.3-70b` | Llama 3.3 70B (SambaNova) | sambanova | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
|
|
| 43 | `grok-2` | Grok 2 | xai | Smart | 131,072 | 32,768 | $2.00 | $10.00 | Yes | Yes |
|
|
| 44 | `grok-2-mini` | Grok 2 Mini | xai | Fast | 131,072 | 32,768 | $0.30 | $0.50 | Yes | No |
|
|
| 45 | `hf/meta-llama/Llama-3.3-70B-Instruct` | Llama 3.3 70B (HF) | huggingface | Balanced | 128,000 | 4,096 | $0.30 | $0.30 | No | No |
|
|
| 46 | `replicate/meta-llama-3.3-70b-instruct` | Llama 3.3 70B (Replicate) | replicate | Balanced | 128,000 | 4,096 | $0.40 | $0.40 | No | No |
|
|
|
|
**Model Tiers:**
|
|
|
|
| Tier | Description | Typical Use |
|
|
|------|------------|------------|
|
|
| **Frontier** | Most capable, highest cost | Orchestration, architecture, security audits |
|
|
| **Smart** | Strong reasoning, moderate cost | Coding, code review, research, analysis |
|
|
| **Balanced** | Good cost/quality tradeoff | Planning, writing, DevOps, day-to-day tasks |
|
|
| **Fast** | Cheapest cloud inference | Ops, translation, simple Q&A, health checks |
|
|
| **Local** | Self-hosted, zero cost | Privacy-first, offline, development |
|
|
|
|
**Notes:**
|
|
- Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with `Local` tier and zero cost.
|
|
- The 46 entries above are the builtin models. The total of 51 referenced in the catalog includes runtime auto-discovered models that vary per installation.
|
|
|
|
---
|
|
|
|
## Model Aliases
|
|
|
|
All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.
|
|
|
|
| Alias | Resolves To |
|
|
|-------|------------|
|
|
| `sonnet` | `claude-sonnet-4-20250514` |
|
|
| `claude-sonnet` | `claude-sonnet-4-20250514` |
|
|
| `haiku` | `claude-haiku-4-5-20251001` |
|
|
| `claude-haiku` | `claude-haiku-4-5-20251001` |
|
|
| `opus` | `claude-opus-4-20250514` |
|
|
| `claude-opus` | `claude-opus-4-20250514` |
|
|
| `gpt4` | `gpt-4o` |
|
|
| `gpt4o` | `gpt-4o` |
|
|
| `gpt4-mini` | `gpt-4o-mini` |
|
|
| `flash` | `gemini-2.5-flash` |
|
|
| `gemini-flash` | `gemini-2.5-flash` |
|
|
| `gemini-pro` | `gemini-2.5-pro` |
|
|
| `deepseek` | `deepseek-chat` |
|
|
| `llama` | `llama-3.3-70b-versatile` |
|
|
| `llama-70b` | `llama-3.3-70b-versatile` |
|
|
| `mixtral` | `mixtral-8x7b-32768` |
|
|
| `mistral` | `mistral-large-latest` |
|
|
| `codestral` | `codestral-latest` |
|
|
| `grok` | `grok-2` |
|
|
| `grok-mini` | `grok-2-mini` |
|
|
| `sonar` | `sonar-pro` |
|
|
| `jamba` | `jamba-1.5-large` |
|
|
| `command-r` | `command-r-plus` |
|
|
|
|
You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.
|
|
|
|
---
|
|
|
|
## Per-Agent Model Override
|
|
|
|
Each agent in your `config.toml` can specify its own model, overriding the global default:
|
|
|
|
```toml
|
|
# Global default model
|
|
[agents.defaults]
|
|
model = "claude-sonnet-4-20250514"
|
|
|
|
# Per-agent override: use an alias or full model ID
|
|
[[agents]]
|
|
name = "orchestrator"
|
|
model = "opus" # alias for claude-opus-4-20250514
|
|
|
|
[[agents]]
|
|
name = "ops"
|
|
model = "llama-3.3-70b-versatile" # cheap Groq model for simple ops
|
|
|
|
[[agents]]
|
|
name = "coder"
|
|
model = "gemini-2.5-flash" # fast + cheap + 1M context
|
|
|
|
[[agents]]
|
|
name = "researcher"
|
|
model = "sonar-pro" # Perplexity with built-in web search
|
|
|
|
# You can also pin a model in the agent manifest TOML
|
|
[[agents]]
|
|
name = "production-bot"
|
|
pinned_model = "claude-sonnet-4-20250514" # never auto-routed
|
|
```
|
|
|
|
When `pinned_model` is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in **Stabilisation mode** (`KernelMode::Stable`) where the model is frozen for production reliability.
|
|
|
|
---
|
|
|
|
## Model Routing
|
|
|
|
OpenFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via `ModelRoutingConfig`.
|
|
|
|
### How It Works
|
|
|
|
1. The **ModelRouter** scores each incoming `CompletionRequest` based on heuristics
|
|
2. The score maps to a **TaskComplexity** tier: `Simple`, `Medium`, or `Complex`
|
|
3. Each tier has a pre-configured model
|
|
|
|
### Scoring Heuristics
|
|
|
|
| Signal | Weight | Logic |
|
|
|--------|--------|-------|
|
|
| Total message length | 1 point per ~4 chars | Rough token proxy |
|
|
| Tool availability | +20 per tool defined | Tools imply multi-step work |
|
|
| Code markers | +30 per marker found | Backticks, `fn`, `def`, `class`, `import`, `function`, `async`, `await`, `struct`, `impl`, `return` |
|
|
| Conversation depth | +15 per message > 10 | Deep context = harder reasoning |
|
|
| System prompt length | +1 per 10 chars > 500 | Long system prompts imply complex tasks |
|
|
|
|
### Thresholds
|
|
|
|
| Complexity | Score Range | Default Model |
|
|
|-----------|-------------|---------------|
|
|
| Simple | score < 100 | `claude-haiku-4-5-20251001` |
|
|
| Medium | 100 <= score < 500 | `claude-sonnet-4-20250514` |
|
|
| Complex | score >= 500 | `claude-sonnet-4-20250514` |
|
|
|
|
### Configuration
|
|
|
|
```toml
|
|
# In agent manifest or config.toml
|
|
[routing]
|
|
simple_model = "claude-haiku-4-5-20251001"
|
|
medium_model = "gemini-2.5-flash"
|
|
complex_model = "claude-sonnet-4-20250514"
|
|
simple_threshold = 100
|
|
complex_threshold = 500
|
|
```
|
|
|
|
The router also integrates with the model catalog:
|
|
- **`validate_models()`** checks that all configured model IDs exist in the catalog
|
|
- **`resolve_aliases()`** expands aliases to canonical IDs (e.g., `"sonnet"` becomes `"claude-sonnet-4-20250514"`)
|
|
|
|
---
|
|
|
|
## Cost Tracking
|
|
|
|
OpenFang tracks the cost of every LLM call and can enforce per-agent spending quotas.
|
|
|
|
### Per-Response Cost Estimation
|
|
|
|
After each LLM call, cost is calculated as:
|
|
|
|
```
|
|
cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate
|
|
```
|
|
|
|
The `MeteringEngine` first checks the **model catalog** for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.
|
|
|
|
### Cost Rates (per million tokens)
|
|
|
|
| Model Pattern | Input $/M | Output $/M |
|
|
|--------------|-----------|------------|
|
|
| `*haiku*` | $0.25 | $1.25 |
|
|
| `*sonnet*` | $3.00 | $15.00 |
|
|
| `*opus*` | $15.00 | $75.00 |
|
|
| `gpt-4o-mini` | $0.15 | $0.60 |
|
|
| `gpt-4o` | $2.50 | $10.00 |
|
|
| `gpt-4.1-nano` | $0.10 | $0.40 |
|
|
| `gpt-4.1-mini` | $0.40 | $1.60 |
|
|
| `gpt-4.1` | $2.00 | $8.00 |
|
|
| `o3-mini` | $1.10 | $4.40 |
|
|
| `gemini-2.5-pro` | $1.25 | $10.00 |
|
|
| `gemini-2.5-flash` | $0.15 | $0.60 |
|
|
| `gemini-2.0-flash` | $0.10 | $0.40 |
|
|
| `deepseek-reasoner` / `deepseek-r1` | $0.55 | $2.19 |
|
|
| `*deepseek*` | $0.27 | $1.10 |
|
|
| `*cerebras*` | $0.06 | $0.06 |
|
|
| `*sambanova*` | $0.06 | $0.06 |
|
|
| `*replicate*` | $0.40 | $0.40 |
|
|
| `*llama*` / `*mixtral*` | $0.05 | $0.10 |
|
|
| `*qwen*` | $0.20 | $0.60 |
|
|
| `mistral-large*` | $2.00 | $6.00 |
|
|
| `*mistral*` (other) | $0.10 | $0.30 |
|
|
| `command-r-plus` | $2.50 | $10.00 |
|
|
| `command-r` | $0.15 | $0.60 |
|
|
| `sonar-pro` | $3.00 | $15.00 |
|
|
| `*sonar*` (other) | $1.00 | $5.00 |
|
|
| `grok-2-mini` / `grok-mini` | $0.30 | $0.50 |
|
|
| `*grok*` (other) | $2.00 | $10.00 |
|
|
| `*jamba*` | $2.00 | $8.00 |
|
|
| Default (unknown) | $1.00 | $3.00 |
|
|
|
|
### Quota Enforcement
|
|
|
|
Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a `QuotaExceeded` error.
|
|
|
|
```toml
|
|
# Per-agent quota in config.toml
|
|
[[agents]]
|
|
name = "chatbot"
|
|
[agents.resources]
|
|
max_cost_per_hour_usd = 5.00 # cap at $5/hour
|
|
```
|
|
|
|
The usage footer (when enabled) appends cost information to each response:
|
|
|
|
```
|
|
> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514
|
|
```
|
|
|
|
---
|
|
|
|
## Fallback Providers
|
|
|
|
The `FallbackDriver` wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.
|
|
|
|
### Behavior
|
|
|
|
- On success: returns immediately
|
|
- On **rate limit / overload** errors (`429`, `529`): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
|
|
- On **all other errors**: logs a warning and tries the next driver in the chain
|
|
- If all drivers fail: returns the last error
|
|
|
|
### Configuration
|
|
|
|
Fallback chains are configured in your agent manifest or `config.toml`. The `FallbackDriver` is used automatically when an agent is in **Stabilisation mode** (`KernelMode::Stable`) or when multiple providers are configured for reliability.
|
|
|
|
```toml
|
|
# Example: primary Anthropic, fallback to Gemini, then Groq
|
|
[[agents]]
|
|
name = "production-bot"
|
|
model = "claude-sonnet-4-20250514"
|
|
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]
|
|
```
|
|
|
|
The fallback driver creates a chain: `AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq)`.
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### List All Models
|
|
|
|
```
|
|
GET /api/models
|
|
```
|
|
|
|
Returns the complete model catalog with metadata, pricing, and feature flags.
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"id": "claude-sonnet-4-20250514",
|
|
"display_name": "Claude Sonnet 4",
|
|
"provider": "anthropic",
|
|
"tier": "Smart",
|
|
"context_window": 200000,
|
|
"max_output_tokens": 64000,
|
|
"input_cost_per_m": 3.0,
|
|
"output_cost_per_m": 15.0,
|
|
"supports_tools": true,
|
|
"supports_vision": true,
|
|
"supports_streaming": true,
|
|
"aliases": ["sonnet", "claude-sonnet"]
|
|
}
|
|
]
|
|
```
|
|
|
|
### Get Specific Model
|
|
|
|
```
|
|
GET /api/models/{id}
|
|
```
|
|
|
|
Returns a single model entry. Supports both canonical IDs and aliases.
|
|
|
|
```
|
|
GET /api/models/sonnet
|
|
GET /api/models/claude-sonnet-4-20250514
|
|
```
|
|
|
|
### List Aliases
|
|
|
|
```
|
|
GET /api/models/aliases
|
|
```
|
|
|
|
Returns a map of all alias-to-canonical-ID mappings.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"sonnet": "claude-sonnet-4-20250514",
|
|
"haiku": "claude-haiku-4-5-20251001",
|
|
"flash": "gemini-2.5-flash",
|
|
"grok": "grok-2"
|
|
}
|
|
```
|
|
|
|
### List Providers
|
|
|
|
```
|
|
GET /api/providers
|
|
```
|
|
|
|
Returns all 20 providers with auth status and model counts.
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"id": "anthropic",
|
|
"display_name": "Anthropic",
|
|
"api_key_env": "ANTHROPIC_API_KEY",
|
|
"base_url": "https://api.anthropic.com",
|
|
"key_required": true,
|
|
"auth_status": "Configured",
|
|
"model_count": 3
|
|
},
|
|
{
|
|
"id": "ollama",
|
|
"display_name": "Ollama",
|
|
"api_key_env": "OLLAMA_API_KEY",
|
|
"base_url": "http://localhost:11434/v1",
|
|
"key_required": false,
|
|
"auth_status": "NotRequired",
|
|
"model_count": 5
|
|
}
|
|
]
|
|
```
|
|
|
|
Auth status values: `Configured`, `Missing`, `NotRequired`.
|
|
|
|
### Set Provider API Key
|
|
|
|
```
|
|
POST /api/providers/{name}/key
|
|
Content-Type: application/json
|
|
|
|
{ "api_key": "sk-..." }
|
|
```
|
|
|
|
Configures an API key for a provider at runtime (stored as a `Zeroizing<String>`, wiped from memory on drop).
|
|
|
|
### Remove Provider API Key
|
|
|
|
```
|
|
DELETE /api/providers/{name}/key
|
|
```
|
|
|
|
Removes the configured API key for a provider.
|
|
|
|
### Test Provider Connection
|
|
|
|
```
|
|
POST /api/providers/{name}/test
|
|
```
|
|
|
|
Sends a minimal test request to verify the provider is reachable and the API key is valid.
|
|
|
|
---
|
|
|
|
## Channel Commands
|
|
|
|
Two chat commands are available in any channel for inspecting models and providers:
|
|
|
|
### `/models`
|
|
|
|
Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).
|
|
|
|
```
|
|
/models
|
|
```
|
|
|
|
Example output:
|
|
```
|
|
Available models (12):
|
|
|
|
Frontier:
|
|
claude-opus-4-20250514 (Anthropic) — 200K ctx
|
|
gemini-2.5-pro (Google Gemini) — 1M ctx
|
|
|
|
Smart:
|
|
claude-sonnet-4-20250514 (Anthropic) — 200K ctx
|
|
gemini-2.5-flash (Google Gemini) — 1M ctx
|
|
deepseek-chat (DeepSeek) — 64K ctx
|
|
|
|
Balanced:
|
|
llama-3.3-70b-versatile (Groq) — 128K ctx
|
|
|
|
Fast:
|
|
claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
|
|
gemini-2.0-flash (Google Gemini) — 1M ctx
|
|
|
|
Local:
|
|
llama3.2 (Ollama) — 128K ctx
|
|
```
|
|
|
|
### `/providers`
|
|
|
|
Lists all 20 providers with their authentication status.
|
|
|
|
```
|
|
/providers
|
|
```
|
|
|
|
Example output:
|
|
```
|
|
LLM Providers (20):
|
|
|
|
Anthropic ANTHROPIC_API_KEY Configured 3 models
|
|
OpenAI OPENAI_API_KEY Missing 6 models
|
|
Google Gemini GEMINI_API_KEY Configured 3 models
|
|
DeepSeek DEEPSEEK_API_KEY Missing 2 models
|
|
Groq GROQ_API_KEY Configured 4 models
|
|
Ollama (no key needed) Ready 3 models
|
|
vLLM (no key needed) Ready 1 model
|
|
LM Studio (no key needed) Ready 1 model
|
|
...
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables Summary
|
|
|
|
Quick reference for all provider environment variables:
|
|
|
|
| Provider | Env Var | Required |
|
|
|----------|---------|----------|
|
|
| Anthropic | `ANTHROPIC_API_KEY` | Yes |
|
|
| OpenAI | `OPENAI_API_KEY` | Yes |
|
|
| Google Gemini | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Yes |
|
|
| DeepSeek | `DEEPSEEK_API_KEY` | Yes |
|
|
| Groq | `GROQ_API_KEY` | Yes |
|
|
| OpenRouter | `OPENROUTER_API_KEY` | Yes |
|
|
| Mistral AI | `MISTRAL_API_KEY` | Yes |
|
|
| Together AI | `TOGETHER_API_KEY` | Yes |
|
|
| Fireworks AI | `FIREWORKS_API_KEY` | Yes |
|
|
| Ollama | `OLLAMA_API_KEY` | No |
|
|
| vLLM | `VLLM_API_KEY` | No |
|
|
| LM Studio | `LMSTUDIO_API_KEY` | No |
|
|
| Perplexity AI | `PERPLEXITY_API_KEY` | Yes |
|
|
| Cohere | `COHERE_API_KEY` | Yes |
|
|
| AI21 Labs | `AI21_API_KEY` | Yes |
|
|
| Cerebras | `CEREBRAS_API_KEY` | Yes |
|
|
| SambaNova | `SAMBANOVA_API_KEY` | Yes |
|
|
| Hugging Face | `HF_API_KEY` | Yes |
|
|
| xAI | `XAI_API_KEY` | Yes |
|
|
| Replicate | `REPLICATE_API_TOKEN` | Yes |
|
|
|
|
---
|
|
|
|
## Security Notes
|
|
|
|
- All API keys are stored as `Zeroizing<String>` -- the key material is automatically overwritten with zeros when the value is dropped from memory.
|
|
- Auth detection (`detect_auth()`) only checks `std::env::var()` for presence -- it never reads or logs the actual secret value.
|
|
- Provider API keys set via the REST API (`POST /api/providers/{name}/key`) follow the same zeroization policy.
|
|
- The health endpoint (`/api/health`) never exposes provider auth status or API keys. Detailed info is behind `/api/health/detail` which requires authentication.
|
|
- All `DriverConfig` and `KernelConfig` structs implement `Debug` with secret redaction -- API keys are printed as `"***"` in logs.
|