The pipeline

Cascade is built around one pipeline. The input changes depending on which on-ramp you use, but every approved story flows through the same stages in the same order.

At a glance

INPUT                  HUMAN GATE              AGENT WORK              HUMAN GATE
=====                  ==========              ==========              ==========

audio file
   │
   ▼
[ ingest ] ──► [ extract ] ──► [ review ] ──► [ plan ] ──► [ code ]
                                   │                            │
                                   ▼                            ▼
                              approve /                  [ apply ] ──► [ install ] ──► [ test ]
                              reject                                                        │
                                                                                            ▼
                                                                              [ commit ] ──► [ push ] ──► [ pr ]
                                                                                                              │
                                                                                                              ▼
                                                                                                       review + merge

Every stage reads from the team memory layer (conventions, decisions, glossary, prior work, constraints). That is what makes the output fit your codebase instead of generic AI output. See team memory for what to put in it.

On-ramps: where the pipeline starts

You enter the pipeline at one of three points depending on where the work originated:

On-ramp	Starts at	When to use
`cascade ingest`	Ingest	You have a meeting recording (sprint planning, requirements call, etc.)
`cascade ticket`	Plan	The requirement is already a Jira / Linear / GitHub / Azure DevOps ticket
`cascade prompt`	Plan	You just want to type one line: “Add cursor pagination to /api/users”
`cascade extract`	Extract	You already have a transcript YAML (manual transcription, or a re-run)
`cascade review`	Review	A previous `extract` produced stories; you want to triage them
`cascade build`	Plan	You have an approved story batch ready to ship

What each stage does

1. Ingest

Turns audio (or video) into a structured transcript YAML.

Input: an mp3, mp4, wav, m4a, or similar file
Output: transcripts/{meeting_id}.yaml with speakers, timestamps, and text per turn
Backends: faster-whisper (recommended, fastest CPU option), openai-whisper (well-tested baseline), or openai-api (cloud, requires opt-in). Cascade picks the best available automatically.
Diarization: if pyannote.audio is installed, turns are labeled with speaker IDs. Without it, everything is labeled “Speaker” and the rest of the pipeline still works.
Local-first: the default backend keeps audio on your machine.

2. Extract

Reads the transcript and a budgeted slice of team memory, and produces a list of user stories with full acceptance criteria.

Input: a transcript YAML and your team-memory/glossary.md (heavily weighted) plus other memory files
Output: stories/{meeting_id}.yaml containing one StoryBatch with N stories
Each story has: title, description, Given/When/Then acceptance criteria, T-shirt size estimate (XS/S/M/L/XL), and a confidence score 0-100
Low confidence stories are flagged. Below 60 means the extractor is uncertain about scope or feasibility; you should pay extra attention during review.

3. Review

Walks you through each story in the batch interactively. This is your first human gate.

Per story: accept, edit (opens in $EDITOR), reject, or skip for later
State is saved as you go. Quit any time with Ctrl-C and resume with cascade review stories/{file}.yaml
Or use the web UI: cascade ui opens the Cascade Studio dashboard at http://localhost:8000 with a more visual review board
Why this matters: the cheapest place to fix a misunderstood requirement is here, before any code gets generated.

4. Plan

For each approved story, an LLM produces a file-level implementation plan.

Input: the story, the relevant team memory (decisions and prior-work heavily weighted), a structured summary of your repo (top-level files, language profile, key directories)
Output: a Plan listing every file to create, modify, or delete, with the intent for each change in plain English
The plan also captures: explicit risks (“this might break the existing /users endpoint”) and out-of-scope notes (“the migration to TypeScript is not part of this story”)
Why a separate plan stage: lets the model think structurally about the change before writing code. Reduces hallucination and scope creep.

5. Code

Reads the plan and generates the full file contents for every listed change.

Input: the plan, the current contents of any files being modified, team memory (conventions and constraints heavily weighted)
Output: a CodeChange with full new file content for each entry
For modified files: current contents are included in the prompt so the LLM can preserve unrelated code in the same file
Validation: Cascade checks the generated changes against the plan. If the LLM tried to create a file the plan did not list, or used a different action than planned, the run fails before anything touches disk.

6. Apply

Writes the generated files to disk. Pure I/O, no LLM involved.

Creates, modifies, or deletes each file per the CodeChange
Tracks which files were touched (for the next stages)
If apply fails partway through, the branch is left in a dirty state for you to inspect

7. Install

Best-effort dependency install. Skipped if the language’s install tool is not available.

Python: pip install -e . if pyproject.toml is present
TypeScript / JavaScript: npm install, pnpm install, or yarn install (auto-detected)
Go: go mod tidy
Rust: cargo build (just to populate dependencies)
Other languages: skipped
Install failure does not abort the pipeline. Tests just run with whatever is in the env.

8. Test

Runs the project’s test command in a subprocess and captures the result.

Auto-detects the command from the language profile (e.g., pytest, vitest, go test ./..., cargo test)
Overridable via test_command in cascade.yaml
Captures: pass/fail, exit code, full stdout/stderr, duration
A test failure does not abort the commit. The failure is recorded in the PR body so a human can see it and decide whether to fix or revert.

9. Commit

Stages the touched files and creates a commit with a conventional-style message.

feat: Add cursor pagination to /api/users

Story: story-prompt-20260925-103045
Meeting: cascade-try-builtin

Files changed:
  - modify: src/api/users.py
  - create: tests/test_users_pagination.py

Generated by Cascade. Human review required before merge.

10. Push

Creates a feature branch (cascade/{story-id}/{slugified-title}), pushes it to the remote.

Branch name encodes the story so you can correlate runs to PRs later
Push uses the VCS provider you configured (GitHub, GitLab, Bitbucket, or Azure DevOps)

11. PR

Opens a pull request via the VCS provider’s API.

Title: conventional commit style, capped at 140 chars
Body: the story, the acceptance criteria, the list of files changed, the test result, and a Cascade attribution
Reviewers: none auto-assigned; configure that in your VCS if you want it
Labels: Cascade can be configured to add labels like agent or needs-review; off by default

Where humans approve

Two gates, both enforced:

Story review (stage 3) before any code is generated. You see what Cascade plans to build and can edit or reject before it touches your repo. This is the cheapest place to course-correct.
PR review (after stage 11) before any code is merged. Cascade never has merge permissions. A human always approves the final change.

Where team memory enters

Every LLM call across every stage receives a budgeted slice of team memory as grounding context:

Stage	Files heavily weighted
Extract	`glossary.md` (domain terms)
Plan	`decisions.md`, `prior-work.md`, `constraints.md`
Code	`conventions.md`, `constraints.md`

The total memory sent per call is capped at 20,000 characters by default. Files exceeding the cap are truncated proportionally. See team memory for how to populate the files for best results.

Cost and progress

Every LLM call surfaces its estimated cost in the CLI:

  cost: $0.12 (8,234 in / 2,156 out tokens, anthropic/claude-opus-4-7)

For multi-story builds, a session total prints at the end. cascade build --max-cost 5.00 aborts between stories if cumulative cost would exceed your budget.

During long runs, Cascade prints animated per-stage spinners with checkmarks as each one completes. Pass -q / --quiet to suppress the animation (useful in CI).

Failure handling

The pipeline is fail-fast. Any error in any stage aborts the rest and surfaces a structured error message with:

What stage failed
What went wrong
A specific suggestion for how to fix it
A link to deeper docs if relevant

The branch and any applied files are left in place so you can inspect them. Run git status and git diff to see what Cascade did before the failure.

What is next

Team memory: the grounding layer that makes every stage output better
Security model: what the agent can and cannot do
Languages: per-language behavior for test commands, dependency install, and file layout
CLI reference: every command and flag