API reference

Fred's proxy speaks the OpenAI Chat Completions API. Drop in any OpenAI client, point base_url at https://api.fredcode.net/v1, and use your fred_live_* key. No custom SDK, no glue layer — just OpenAI shape with managed billing on top.

Heads up

Bare API consumers — Fred is a streaming proxy, not a request/response one. Set stream: true on your client. Non-streaming requests work but you'll lose the in-flight cancellation behavior the proxy relies on, and slow turns will hold a TCP connection longer than they need to.

Note

Want managed billing without using the CLI? You can absolutely use the proxy directly with any OpenAI client. The dashboard at app.fredcode.net works the same — keys, usage rows, top-ups, all of it.

Endpoints

Method & pathPurposeAuth
POST /v1/chat/completionsThe main streaming endpoint. OpenAI Chat Completions shape, billed per call.Bearer key
GET /healthService probe. Returns 200 with a tiny JSON status. No upstream call.None
GET /api/pricingPublic rate cards. Same JSON the Fred CLI consumes to render /cost.None

Authentication

Every billed request needs a bearer token in the Authorization header:

Authorization: Bearer fred_live_...

Get a key by running fred login (which provisions one for the device) or by creating one in the dashboard at https://app.fredcode.net/api-keys. Keys are scoped to your account and revocable from that same page; once revoked the proxy returns 401 revoked_api_key immediately.

  • Keys are prefixed fred_live_ — no test/prod split, since billing is real on every call.
  • The header is the only supported auth shape. Query-string keys, basic auth, and cookies are not accepted.
  • GET /health and GET /api/pricing ignore the header entirely.

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.fredcode.net/v1",
    api_key="fred_live_...",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "say hi"}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in resp:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if chunk.usage:
        print(f"\n[usage] in={chunk.usage.prompt_tokens} out={chunk.usage.completion_tokens}")

TypeScript (openai-node)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.fredcode.net/v1",
  apiKey: process.env.FRED_API_KEY!,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "say hi" }],
  stream: true,
  stream_options: { include_usage: true },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
  if (chunk.usage) {
    console.log(`\n[usage] in=${chunk.usage.prompt_tokens} out=${chunk.usage.completion_tokens}`);
  }
}

curl

curl -N https://api.fredcode.net/v1/chat/completions \
  -H "Authorization: Bearer fred_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "say hi"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

All three examples set stream: true and stream_options: { include_usage: true }. The proxy auto-injects include_usage on the upstream request so it can meter the call, but clients that send it explicitly are fine — it's idempotent. The final SSE chunk is the one with the usage object on it.

Allowed models

Two SKUs are permitted. The proxy validates against this allowlist before opening the upstream socket — anything else returns 400 unknown_model:

ModelUse it for
deepseek-v4-flashDefault. Fast, cheap, good for ~95% of agent turns.
deepseek-v4-proReasoning model. Use for hard debug, design, math-heavy work.

The full breakdown lives at /docs/models. There is no deepseek-v3, no -base, no aliases — those return 400 unknown_model.

Correlation headers (optional)

Two headers let you group calls into sessions and turns the same way the CLI does. The Fred CLI sends them automatically; bare API consumers can set them too if they want session-grouped views in the /usage dashboard.

HeaderFormatPurpose
x-fred-session-idUUID v4Ties multiple requests to one CLI session. The dashboard's per-day breakdown groups by this column when you drill in.
x-fred-turn-idOpaque, < 128 charsTies one model turn together — the agent loop bumps this for each user-visible response so retries within a turn share an ID.

Both are optional. If you skip them the row still lands in usage_events; it just won't be groupable beyond the per-key/per-day rollup.

Response headers

HeaderValue
x-fred-request-idUUID, echoed back on every response (success or error). Match against usage_events.request_id in the /usage dashboard to find a specific call.
Content-Typetext/event-stream for streaming responses; application/json for non-streaming and for error bodies.
Retry-AfterSeconds, on 429 only. Token-bucket refill estimate.

Errors

Errors are JSON, OpenAI-compatible shape: { error: { code, message } }. The HTTP status and the error.code string are the contract — the message text can change.

StatusCodeMeaning
400invalid_jsonBody wasn't parseable JSON. Check Content-Type and trailing commas.
400missing_modelNo model field on the request body.
400unknown_modelModel isn't on the allowlist. Use deepseek-v4-flash or deepseek-v4-pro.
401missing_bearer_tokenNo Authorization header.
401invalid_api_keyHeader present, but the key doesn't exist or is malformed.
401revoked_api_keyKey was valid but has been revoked. Mint a new one at app.fredcode.net/api-keys.
402insufficient_creditsAccount balance won't cover the call. Body includes topup_url for one-click checkout.
429rate_limitedPer-IP or per-user limit. Honor Retry-After.
502upstream_unreachableDeepSeek connection failed (DNS, TLS, reset, timeout). Retry with backoff.

Rate limits

  • Pre-auth (per IP) — 120 requests / minute, sliding window. Catches credential-stuffing and obvious abuse before the proxy spends cycles validating a key.
  • Post-auth (per user) — 60 requests / minute, token-bucket with a 10-burst. The bucket refills at 1/sec; the burst lets you fire 10 parallel calls without immediately tripping the limit.

Hit the limit and you get 429 rate_limited with Retry-After seconds. The Fred CLI honors Retry-After automatically; bare clients should too.

Streaming

Responses are Server-Sent Events — the same wire format OpenAI uses. Each data: line is a JSON chunk; the stream ends with data: [DONE]. For client-side parsing patterns, the OpenAI streaming docs apply verbatim — every official OpenAI SDK handles this for you.

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"hi"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}]}

data: {"id":"chatcmpl-...","choices":[],"usage":{
  "prompt_tokens": 12,
  "completion_tokens": 4,
  "total_tokens": 16,
  "prompt_cache_hit_tokens": 0,
  "prompt_cache_miss_tokens": 12,
  "completion_tokens_details": {"reasoning_tokens": 0}
}}

data: [DONE]

The final usage chunk is the interesting one. DeepSeek's response includes three fields the proxy captures into the usage_events row:

  • prompt_cache_hit_tokens — input tokens that hit DeepSeek's prompt cache. Billed at the much-lower cached rate.
  • prompt_cache_miss_tokens — input tokens that didn't hit the cache. Billed at the uncached input rate.
  • completion_tokens_details.reasoning_tokens — chain-of-thought tokens emitted before the visible answer (deepseek-v4-pro only). Counted toward the output total, billed at the output rate.

We capture all three into usage_events, so the per-row breakdown in the /usage dashboard matches exactly what DeepSeek reported. See /docs/billing for how the cached/uncached split shows up on your bill.