The pipeline
Cascade is built around one pipeline. The input changes depending on which on-ramp you use, but every approved story flows through the same stages in the same order.
At a glance
Section titled “At a glance”INPUT HUMAN GATE AGENT WORK HUMAN GATE===== ========== ========== ==========
audio file │ ▼[ ingest ] ──► [ extract ] ──► [ review ] ──► [ plan ] ──► [ code ] │ │ ▼ ▼ approve / [ apply ] ──► [ install ] ──► [ test ] reject │ ▼ [ commit ] ──► [ push ] ──► [ pr ] │ ▼ review + mergeEvery stage reads from the team memory layer (conventions, decisions, glossary, prior work, constraints). That is what makes the output fit your codebase instead of generic AI output. See team memory for what to put in it.
On-ramps: where the pipeline starts
Section titled “On-ramps: where the pipeline starts”You enter the pipeline at one of three points depending on where the work originated:
| On-ramp | Starts at | When to use |
|---|---|---|
cascade ingest | Ingest | You have a meeting recording (sprint planning, requirements call, etc.) |
cascade ticket | Plan | The requirement is already a Jira / Linear / GitHub / Azure DevOps ticket |
cascade prompt | Plan | You just want to type one line: “Add cursor pagination to /api/users” |
cascade extract | Extract | You already have a transcript YAML (manual transcription, or a re-run) |
cascade review | Review | A previous extract produced stories; you want to triage them |
cascade build | Plan | You have an approved story batch ready to ship |
What each stage does
Section titled “What each stage does”1. Ingest
Section titled “1. Ingest”Turns audio (or video) into a structured transcript YAML.
- Input: an mp3, mp4, wav, m4a, or similar file
- Output:
transcripts/{meeting_id}.yamlwith speakers, timestamps, and text per turn - Backends:
faster-whisper(recommended, fastest CPU option),openai-whisper(well-tested baseline), oropenai-api(cloud, requires opt-in). Cascade picks the best available automatically. - Diarization: if
pyannote.audiois installed, turns are labeled with speaker IDs. Without it, everything is labeled “Speaker” and the rest of the pipeline still works. - Local-first: the default backend keeps audio on your machine.
2. Extract
Section titled “2. Extract”Reads the transcript and a budgeted slice of team memory, and produces a list of user stories with full acceptance criteria.
- Input: a transcript YAML and your
team-memory/glossary.md(heavily weighted) plus other memory files - Output:
stories/{meeting_id}.yamlcontaining oneStoryBatchwith N stories - Each story has: title, description, Given/When/Then acceptance criteria, T-shirt size estimate (XS/S/M/L/XL), and a confidence score 0-100
- Low confidence stories are flagged. Below 60 means the extractor is uncertain about scope or feasibility; you should pay extra attention during review.
3. Review
Section titled “3. Review”Walks you through each story in the batch interactively. This is your first human gate.
- Per story: accept, edit (opens in
$EDITOR), reject, or skip for later - State is saved as you go. Quit any time with Ctrl-C and resume with
cascade review stories/{file}.yaml - Or use the web UI:
cascade uiopens the Cascade Studio dashboard at http://localhost:8000 with a more visual review board - Why this matters: the cheapest place to fix a misunderstood requirement is here, before any code gets generated.
4. Plan
Section titled “4. Plan”For each approved story, an LLM produces a file-level implementation plan.
- Input: the story, the relevant team memory (decisions and prior-work heavily weighted), a structured summary of your repo (top-level files, language profile, key directories)
- Output: a
Planlisting every file to create, modify, or delete, with the intent for each change in plain English - The plan also captures: explicit risks (“this might break the existing /users endpoint”) and out-of-scope notes (“the migration to TypeScript is not part of this story”)
- Why a separate plan stage: lets the model think structurally about the change before writing code. Reduces hallucination and scope creep.
5. Code
Section titled “5. Code”Reads the plan and generates the full file contents for every listed change.
- Input: the plan, the current contents of any files being modified, team memory (conventions and constraints heavily weighted)
- Output: a
CodeChangewith full new file content for each entry - For modified files: current contents are included in the prompt so the LLM can preserve unrelated code in the same file
- Validation: Cascade checks the generated changes against the plan. If the LLM tried to create a file the plan did not list, or used a different action than planned, the run fails before anything touches disk.
6. Apply
Section titled “6. Apply”Writes the generated files to disk. Pure I/O, no LLM involved.
- Creates, modifies, or deletes each file per the
CodeChange - Tracks which files were touched (for the next stages)
- If apply fails partway through, the branch is left in a dirty state for you to inspect
7. Install
Section titled “7. Install”Best-effort dependency install. Skipped if the language’s install tool is not available.
- Python:
pip install -e .ifpyproject.tomlis present - TypeScript / JavaScript:
npm install,pnpm install, oryarn install(auto-detected) - Go:
go mod tidy - Rust:
cargo build(just to populate dependencies) - Other languages: skipped
- Install failure does not abort the pipeline. Tests just run with whatever is in the env.
8. Test
Section titled “8. Test”Runs the project’s test command in a subprocess and captures the result.
- Auto-detects the command from the language profile (e.g.,
pytest,vitest,go test ./...,cargo test) - Overridable via
test_commandincascade.yaml - Captures: pass/fail, exit code, full stdout/stderr, duration
- A test failure does not abort the commit. The failure is recorded in the PR body so a human can see it and decide whether to fix or revert.
9. Commit
Section titled “9. Commit”Stages the touched files and creates a commit with a conventional-style message.
feat: Add cursor pagination to /api/users
Story: story-prompt-20260925-103045Meeting: cascade-try-builtin
Files changed: - modify: src/api/users.py - create: tests/test_users_pagination.py
Generated by Cascade. Human review required before merge.10. Push
Section titled “10. Push”Creates a feature branch (cascade/{story-id}/{slugified-title}), pushes it to the remote.
- Branch name encodes the story so you can correlate runs to PRs later
- Push uses the VCS provider you configured (GitHub, GitLab, Bitbucket, or Azure DevOps)
11. PR
Section titled “11. PR”Opens a pull request via the VCS provider’s API.
- Title: conventional commit style, capped at 140 chars
- Body: the story, the acceptance criteria, the list of files changed, the test result, and a Cascade attribution
- Reviewers: none auto-assigned; configure that in your VCS if you want it
- Labels: Cascade can be configured to add labels like
agentorneeds-review; off by default
Where humans approve
Section titled “Where humans approve”Two gates, both enforced:
- Story review (stage 3) before any code is generated. You see what Cascade plans to build and can edit or reject before it touches your repo. This is the cheapest place to course-correct.
- PR review (after stage 11) before any code is merged. Cascade never has merge permissions. A human always approves the final change.
Where team memory enters
Section titled “Where team memory enters”Every LLM call across every stage receives a budgeted slice of team memory as grounding context:
| Stage | Files heavily weighted |
|---|---|
| Extract | glossary.md (domain terms) |
| Plan | decisions.md, prior-work.md, constraints.md |
| Code | conventions.md, constraints.md |
The total memory sent per call is capped at 20,000 characters by default. Files exceeding the cap are truncated proportionally. See team memory for how to populate the files for best results.
Cost and progress
Section titled “Cost and progress”Every LLM call surfaces its estimated cost in the CLI:
cost: $0.12 (8,234 in / 2,156 out tokens, anthropic/claude-opus-4-7)For multi-story builds, a session total prints at the end. cascade build --max-cost 5.00 aborts between stories if cumulative cost would exceed your budget.
During long runs, Cascade prints animated per-stage spinners with checkmarks as each one completes. Pass -q / --quiet to suppress the animation (useful in CI).
Failure handling
Section titled “Failure handling”The pipeline is fail-fast. Any error in any stage aborts the rest and surfaces a structured error message with:
- What stage failed
- What went wrong
- A specific suggestion for how to fix it
- A link to deeper docs if relevant
The branch and any applied files are left in place so you can inspect them. Run git status and git diff to see what Cascade did before the failure.
What is next
Section titled “What is next”- Team memory: the grounding layer that makes every stage output better
- Security model: what the agent can and cannot do
- Languages: per-language behavior for test commands, dependency install, and file layout
- CLI reference: every command and flag