API reference

Fred's proxy speaks the OpenAI Chat Completions API. Drop in any OpenAI client, point base_url at https://api.fredcode.net/v1, and use your fred_live_* key. No custom SDK, no glue layer — just OpenAI shape with managed billing on top.

Heads up

Bare API consumers — Fred is a streaming proxy, not a request/response one. Set stream: true on your client. Non-streaming requests work but you'll lose the in-flight cancellation behavior the proxy relies on, and slow turns will hold a TCP connection longer than they need to.

Note

Want managed billing without using the CLI? You can absolutely use the proxy directly with any OpenAI client. The dashboard at app.fredcode.net works the same — keys, usage rows, top-ups, all of it.

Endpoints

Method & path	Purpose	Auth
`POST /v1/chat/completions`	The main streaming endpoint. OpenAI Chat Completions shape, billed per call.	Bearer key
`GET /health`	Service probe. Returns 200 with a tiny JSON status. No upstream call.	None
`GET /api/pricing`	Public rate cards. Same JSON the Fred CLI consumes to render `/cost`.	None

Authentication

Every billed request needs a bearer token in the Authorization header:

Authorization: Bearer fred_live_...

Get a key by running fred login (which provisions one for the device) or by creating one in the dashboard at https://app.fredcode.net/api-keys. Keys are scoped to your account and revocable from that same page; once revoked the proxy returns 401 revoked_api_key immediately.

Keys are prefixed fred_live_ — no test/prod split, since billing is real on every call.
The header is the only supported auth shape. Query-string keys, basic auth, and cookies are not accepted.
GET /health and GET /api/pricing ignore the header entirely.

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.fredcode.net/v1",
    api_key="fred_live_...",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "say hi"}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in resp:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if chunk.usage:
        print(f"\n[usage] in={chunk.usage.prompt_tokens} out={chunk.usage.completion_tokens}")

TypeScript (openai-node)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.fredcode.net/v1",
  apiKey: process.env.FRED_API_KEY!,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "say hi" }],
  stream: true,
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
  if (chunk.usage) {
    console.log(`\n[usage] in=${chunk.usage.prompt_tokens} out=${chunk.usage.completion_tokens}`);
  }
}

curl

curl -N https://api.fredcode.net/v1/chat/completions \
  -H "Authorization: Bearer fred_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "say hi"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

All three examples set stream: true and stream_options: { include_usage: true }. The proxy auto-injects include_usage on the upstream request so it can meter the call, but clients that send it explicitly are fine — it's idempotent. The final SSE chunk is the one with the usage object on it.

Allowed models

Two SKUs are permitted. The proxy validates against this allowlist before opening the upstream socket — anything else returns 400 unknown_model:

Model	Use it for
`deepseek-v4-flash`	Default. Fast, cheap, good for ~95% of agent turns.
`deepseek-v4-pro`	Reasoning model. Use for hard debug, design, math-heavy work.

The full breakdown lives at /docs/models. There is no deepseek-v3, no -base, no aliases — those return 400 unknown_model.

Correlation headers (optional)

Two headers let you group calls into sessions and turns the same way the CLI does. The Fred CLI sends them automatically; bare API consumers can set them too if they want session-grouped views in the /usage dashboard.

Header	Format	Purpose
`x-fred-session-id`	UUID v4	Ties multiple requests to one CLI session. The dashboard's per-day breakdown groups by this column when you drill in.
`x-fred-turn-id`	Opaque, < 128 chars	Ties one model turn together — the agent loop bumps this for each user-visible response so retries within a turn share an ID.

Both are optional. If you skip them the row still lands in usage_events; it just won't be groupable beyond the per-key/per-day rollup.

Response headers

Header	Value
`x-fred-request-id`	UUID, echoed back on every response (success or error). Match against `usage_events.request_id` in the `/usage` dashboard to find a specific call.
`Content-Type`	`text/event-stream` for streaming responses; `application/json` for non-streaming and for error bodies.
`Retry-After`	Seconds, on `429` only. Token-bucket refill estimate.

Errors

Errors are JSON, OpenAI-compatible shape: { error: { code, message } }. The HTTP status and the error.code string are the contract — the message text can change.

Status	Code	Meaning
`400`	`invalid_json`	Body wasn't parseable JSON. Check Content-Type and trailing commas.
`400`	`missing_model`	No `model` field on the request body.
`400`	`unknown_model`	Model isn't on the allowlist. Use `deepseek-v4-flash` or `deepseek-v4-pro`.
`401`	`missing_bearer_token`	No `Authorization` header.
`401`	`invalid_api_key`	Header present, but the key doesn't exist or is malformed.
`401`	`revoked_api_key`	Key was valid but has been revoked. Mint a new one at `app.fredcode.net/api-keys`.
`402`	`insufficient_credits`	Account balance won't cover the call. Body includes `topup_url` for one-click checkout.
`429`	`rate_limited`	Per-IP or per-user limit. Honor `Retry-After`.
`502`	`upstream_unreachable`	DeepSeek connection failed (DNS, TLS, reset, timeout). Retry with backoff.

Rate limits

Pre-auth (per IP) — 120 requests / minute, sliding window. Catches credential-stuffing and obvious abuse before the proxy spends cycles validating a key.
Post-auth (per user) — 60 requests / minute, token-bucket with a 10-burst. The bucket refills at 1/sec; the burst lets you fire 10 parallel calls without immediately tripping the limit.

Hit the limit and you get 429 rate_limited with Retry-After seconds. The Fred CLI honors Retry-After automatically; bare clients should too.

Streaming

Responses are Server-Sent Events — the same wire format OpenAI uses. Each data: line is a JSON chunk; the stream ends with data: [DONE]. For client-side parsing patterns, the OpenAI streaming docs apply verbatim — every official OpenAI SDK handles this for you.

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"hi"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}]}

data: {"id":"chatcmpl-...","choices":[],"usage":{
  "prompt_tokens": 12,
  "completion_tokens": 4,
  "total_tokens": 16,
  "prompt_cache_hit_tokens": 0,
  "prompt_cache_miss_tokens": 12,
  "completion_tokens_details": {"reasoning_tokens": 0}
}}

data: [DONE]

The final usage chunk is the interesting one. DeepSeek's response includes three fields the proxy captures into the usage_events row:

prompt_cache_hit_tokens — input tokens that hit DeepSeek's prompt cache. Billed at the much-lower cached rate.
prompt_cache_miss_tokens — input tokens that didn't hit the cache. Billed at the uncached input rate.
completion_tokens_details.reasoning_tokens — chain-of-thought tokens emitted before the visible answer (deepseek-v4-pro only). Counted toward the output total, billed at the output rate.

We capture all three into usage_events, so the per-row breakdown in the /usage dashboard matches exactly what DeepSeek reported. See /docs/billing for how the cached/uncached split shows up on your bill.