Models

Fred ships with two DeepSeek SKUs. Default to flash; switch to pro for the hard turns. That's the whole strategy.

deepseek-v4-flash (default)

Fast, cheap, surprisingly capable. This is what every session starts on, and it's the right choice for the overwhelming majority of coding work — refactors, file edits, search, test scaffolding, lint fixes, anything where the bottleneck is "look at the code and apply the obvious change."

BucketUpstream rateWhat you pay (2.5× margin)
Input (cache miss)$0.14 / M tokens$0.35 / M tokens
Input (cache hit)$0.0028 / M tokens$0.007 / M tokens
Output$0.28 / M tokens$0.70 / M tokens

deepseek-v4-pro

The reasoning SKU. Slower, much more expensive, and noticeably smarter on hard problems. Worth it for: hairy debugging where the bug spans files, architectural design from a blank page, math/algorithms work, or anytime flash gets stuck in a loop and you can feel it guessing.

BucketUpstream rateWhat you pay (2.5× margin)
Input (cache miss)$1.74 / M tokens$4.35 / M tokens
Input (cache hit)$0.0145 / M tokens$0.0363 / M tokens
Output$3.48 / M tokens$8.70 / M tokens

Output is roughly 12× the cost of flash. Pro is only worth running on the turn that actually needs the reasoning — flip back to flash as soon as the gnarly part is done.

Switching with /flash and /pro

Inside the REPL, two slash commands swap your main slot:

> /flash    # → deepseek-v4-flash
> /pro      # → deepseek-v4-pro

The first /pro in a session prompts you to confirm, with a quick reminder of the cost ratio. Subsequent uses don't re-prompt. Your choice persists to ~/.config/fred/preferences.json so it survives across restarts — start a new Fred next morning and you're still on whatever you last picked.

If you're scripting Fred (CI, batch jobs, fred -p pipelines) and don't want the confirmation prompt, set:

export FRED_NO_CONFIRM_PRO=1
Mid-turn switching

You can /pro halfway through a session, ask the hard question, then /flash to keep going on cleanup. The slot swap takes effect on the very next turn — there's no re-warming, no session reset.

Slot architecture

Fred has three model slots, each with its own job:

  • main — does the actual reasoning. Reads your code, writes the diffs, makes the decisions. This is the one /pro swaps.
  • editor — formats and validates diffs. (TODO: not yet wired up — currently inherits whatever main is on. Once landed, will run on flash regardless of the main slot.)
  • weak — auto-compaction, summarization of old turns, side-channel work that doesn't need reasoning. Always flash by default.

/pro deliberately only swaps main. Paying 12× output rates for a 3,000-token compaction summary is wasted spend — those summaries are exactly the work weak is for. If you actually want every slot on pro (and you have a reason), set them individually:

> /model main pro
> /model weak pro
> /model editor pro

Or one at a time if you prefer. /model with no args prints the current slot assignments.

Persistence and precedence

Several things can set a model. When they conflict, this is the order — top wins:

  1. CLI flag --model <name> (per-invocation, applies to main).
  2. Environment variables: FRED_MAIN_MODEL, FRED_EDITOR_MODEL, FRED_WEAK_MODEL. Legacy DSC_* names are still honored.
  3. ~/.config/fred/preferences.json — what /model, /flash, and /pro write to.
  4. The compiled-in default (DEFAULT_MODEL = deepseek-v4-flash).

So a --model deepseek-v4-pro on the command line beats whatever your preferences file says, which beats the default. Useful for one-off pro runs without polluting your saved preference.

How rates appear in the CLI

After every turn, Fred prints a status line that ends with the live cost:

· deepseek-v4-flash  in=12,431  cached=8,200  out=412  $0.0058

Those numbers come from the rate-card cache at ~/.config/fred/rate-card.json, refreshed on every fred login from https://api.fredcode.net/api/pricing. There's a bundled fallback rate card baked into the package for offline first-runs, so a brand-new install can still print accurate costs before its first network round-trip.

Adding new models

DeepSeek occasionally ships new SKUs. When that happens, we bump the Fred package — the CLI's allowlist updates and you can pin the new model with /model main <new-name> after upgrading.

Until a model is in the allowlist, the proxy returns HTTP 400 with unknown_model rather than silently passing it through. If you try a model name Fred doesn't know about and get that error, run fred update and try again.

Pro adds up fast

A 50,000-input, 1,000-output turn costs roughly $0.0073 on flash and $0.0395 on pro — about 5.4× more end-to-end (the input dominates and pro's input premium is smaller than its output premium). On long pro sessions with lots of generation, the gap widens toward the headline 12× output ratio.

If you've been running on pro for a while, verify your spend at /usage.

Next

Slash commands — every command available in the REPL. CLI reference — flags, env vars, subcommands.