API reference
Fred's proxy speaks the OpenAI Chat Completions API. Drop in any OpenAI client, point base_url at https://api.fredcode.net/v1, and use your fred_live_* key. No custom SDK, no glue layer — just OpenAI shape with managed billing on top.
Bare API consumers — Fred is a streaming proxy, not a request/response one. Set stream: true on your client. Non-streaming requests work but you'll lose the in-flight cancellation behavior the proxy relies on, and slow turns will hold a TCP connection longer than they need to.
Want managed billing without using the CLI? You can absolutely use the proxy directly with any OpenAI client. The dashboard at app.fredcode.net works the same — keys, usage rows, top-ups, all of it.
Endpoints
| Method & path | Purpose | Auth |
|---|---|---|
POST /v1/chat/completions | The main streaming endpoint. OpenAI Chat Completions shape, billed per call. | Bearer key |
GET /health | Service probe. Returns 200 with a tiny JSON status. No upstream call. | None |
GET /api/pricing | Public rate cards. Same JSON the Fred CLI consumes to render /cost. | None |
Authentication
Every billed request needs a bearer token in the Authorization header:
Authorization: Bearer fred_live_...Get a key by running fred login (which provisions one for the device) or by creating one in the dashboard at https://app.fredcode.net/api-keys. Keys are scoped to your account and revocable from that same page; once revoked the proxy returns 401 revoked_api_key immediately.
- Keys are prefixed
fred_live_— no test/prod split, since billing is real on every call. - The header is the only supported auth shape. Query-string keys, basic auth, and cookies are not accepted.
GET /healthandGET /api/pricingignore the header entirely.
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.fredcode.net/v1",
api_key="fred_live_...",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "say hi"}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in resp:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
if chunk.usage:
print(f"\n[usage] in={chunk.usage.prompt_tokens} out={chunk.usage.completion_tokens}")
TypeScript (openai-node)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.fredcode.net/v1",
apiKey: process.env.FRED_API_KEY!,
});
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "say hi" }],
stream: true,
stream_options: { include_usage: true },
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
if (chunk.usage) {
console.log(`\n[usage] in=${chunk.usage.prompt_tokens} out=${chunk.usage.completion_tokens}`);
}
}
curl
curl -N https://api.fredcode.net/v1/chat/completions \
-H "Authorization: Bearer fred_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "say hi"}],
"stream": true,
"stream_options": {"include_usage": true}
}'
All three examples set stream: true and stream_options: { include_usage: true }. The proxy auto-injects include_usage on the upstream request so it can meter the call, but clients that send it explicitly are fine — it's idempotent. The final SSE chunk is the one with the usage object on it.
Allowed models
Two SKUs are permitted. The proxy validates against this allowlist before opening the upstream socket — anything else returns 400 unknown_model:
| Model | Use it for |
|---|---|
deepseek-v4-flash | Default. Fast, cheap, good for ~95% of agent turns. |
deepseek-v4-pro | Reasoning model. Use for hard debug, design, math-heavy work. |
The full breakdown lives at /docs/models. There is no deepseek-v3, no -base, no aliases — those return 400 unknown_model.
Correlation headers (optional)
Two headers let you group calls into sessions and turns the same way the CLI does. The Fred CLI sends them automatically; bare API consumers can set them too if they want session-grouped views in the /usage dashboard.
| Header | Format | Purpose |
|---|---|---|
x-fred-session-id | UUID v4 | Ties multiple requests to one CLI session. The dashboard's per-day breakdown groups by this column when you drill in. |
x-fred-turn-id | Opaque, < 128 chars | Ties one model turn together — the agent loop bumps this for each user-visible response so retries within a turn share an ID. |
Both are optional. If you skip them the row still lands in usage_events; it just won't be groupable beyond the per-key/per-day rollup.
Response headers
| Header | Value |
|---|---|
x-fred-request-id | UUID, echoed back on every response (success or error). Match against usage_events.request_id in the /usage dashboard to find a specific call. |
Content-Type | text/event-stream for streaming responses; application/json for non-streaming and for error bodies. |
Retry-After | Seconds, on 429 only. Token-bucket refill estimate. |
Errors
Errors are JSON, OpenAI-compatible shape: { error: { code, message } }. The HTTP status and the error.code string are the contract — the message text can change.
| Status | Code | Meaning |
|---|---|---|
400 | invalid_json | Body wasn't parseable JSON. Check Content-Type and trailing commas. |
400 | missing_model | No model field on the request body. |
400 | unknown_model | Model isn't on the allowlist. Use deepseek-v4-flash or deepseek-v4-pro. |
401 | missing_bearer_token | No Authorization header. |
401 | invalid_api_key | Header present, but the key doesn't exist or is malformed. |
401 | revoked_api_key | Key was valid but has been revoked. Mint a new one at app.fredcode.net/api-keys. |
402 | insufficient_credits | Account balance won't cover the call. Body includes topup_url for one-click checkout. |
429 | rate_limited | Per-IP or per-user limit. Honor Retry-After. |
502 | upstream_unreachable | DeepSeek connection failed (DNS, TLS, reset, timeout). Retry with backoff. |
Rate limits
- Pre-auth (per IP) — 120 requests / minute, sliding window. Catches credential-stuffing and obvious abuse before the proxy spends cycles validating a key.
- Post-auth (per user) — 60 requests / minute, token-bucket with a 10-burst. The bucket refills at 1/sec; the burst lets you fire 10 parallel calls without immediately tripping the limit.
Hit the limit and you get 429 rate_limited with Retry-After seconds. The Fred CLI honors Retry-After automatically; bare clients should too.
Streaming
Responses are Server-Sent Events — the same wire format OpenAI uses. Each data: line is a JSON chunk; the stream ends with data: [DONE]. For client-side parsing patterns, the OpenAI streaming docs apply verbatim — every official OpenAI SDK handles this for you.
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":""}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"hi"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}]}
data: {"id":"chatcmpl-...","choices":[],"usage":{
"prompt_tokens": 12,
"completion_tokens": 4,
"total_tokens": 16,
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 12,
"completion_tokens_details": {"reasoning_tokens": 0}
}}
data: [DONE]
The final usage chunk is the interesting one. DeepSeek's response includes three fields the proxy captures into the usage_events row:
prompt_cache_hit_tokens— input tokens that hit DeepSeek's prompt cache. Billed at the much-lower cached rate.prompt_cache_miss_tokens— input tokens that didn't hit the cache. Billed at the uncached input rate.completion_tokens_details.reasoning_tokens— chain-of-thought tokens emitted before the visible answer (deepseek-v4-proonly). Counted toward the output total, billed at the output rate.
We capture all three into usage_events, so the per-row breakdown in the /usage dashboard matches exactly what DeepSeek reported. See /docs/billing for how the cached/uncached split shows up on your bill.