Skip to content

The pipeline

Cascade is built around one pipeline. The input changes depending on which on-ramp you use, but every approved story flows through the same stages in the same order.

INPUT HUMAN GATE AGENT WORK HUMAN GATE
===== ========== ========== ==========
audio file
[ ingest ] ──► [ extract ] ──► [ review ] ──► [ plan ] ──► [ code ]
│ │
▼ ▼
approve / [ apply ] ──► [ install ] ──► [ test ]
reject │
[ commit ] ──► [ push ] ──► [ pr ]
review + merge

Every stage reads from the team memory layer (conventions, decisions, glossary, prior work, constraints). That is what makes the output fit your codebase instead of generic AI output. See team memory for what to put in it.

You enter the pipeline at one of three points depending on where the work originated:

On-rampStarts atWhen to use
cascade ingestIngestYou have a meeting recording (sprint planning, requirements call, etc.)
cascade ticketPlanThe requirement is already a Jira / Linear / GitHub / Azure DevOps ticket
cascade promptPlanYou just want to type one line: “Add cursor pagination to /api/users”
cascade extractExtractYou already have a transcript YAML (manual transcription, or a re-run)
cascade reviewReviewA previous extract produced stories; you want to triage them
cascade buildPlanYou have an approved story batch ready to ship

Turns audio (or video) into a structured transcript YAML.

  • Input: an mp3, mp4, wav, m4a, or similar file
  • Output: transcripts/{meeting_id}.yaml with speakers, timestamps, and text per turn
  • Backends: faster-whisper (recommended, fastest CPU option), openai-whisper (well-tested baseline), or openai-api (cloud, requires opt-in). Cascade picks the best available automatically.
  • Diarization: if pyannote.audio is installed, turns are labeled with speaker IDs. Without it, everything is labeled “Speaker” and the rest of the pipeline still works.
  • Local-first: the default backend keeps audio on your machine.

Reads the transcript and a budgeted slice of team memory, and produces a list of user stories with full acceptance criteria.

  • Input: a transcript YAML and your team-memory/glossary.md (heavily weighted) plus other memory files
  • Output: stories/{meeting_id}.yaml containing one StoryBatch with N stories
  • Each story has: title, description, Given/When/Then acceptance criteria, T-shirt size estimate (XS/S/M/L/XL), and a confidence score 0-100
  • Low confidence stories are flagged. Below 60 means the extractor is uncertain about scope or feasibility; you should pay extra attention during review.

Walks you through each story in the batch interactively. This is your first human gate.

  • Per story: accept, edit (opens in $EDITOR), reject, or skip for later
  • State is saved as you go. Quit any time with Ctrl-C and resume with cascade review stories/{file}.yaml
  • Or use the web UI: cascade ui opens the Cascade Studio dashboard at http://localhost:8000 with a more visual review board
  • Why this matters: the cheapest place to fix a misunderstood requirement is here, before any code gets generated.

For each approved story, an LLM produces a file-level implementation plan.

  • Input: the story, the relevant team memory (decisions and prior-work heavily weighted), a structured summary of your repo (top-level files, language profile, key directories)
  • Output: a Plan listing every file to create, modify, or delete, with the intent for each change in plain English
  • The plan also captures: explicit risks (“this might break the existing /users endpoint”) and out-of-scope notes (“the migration to TypeScript is not part of this story”)
  • Why a separate plan stage: lets the model think structurally about the change before writing code. Reduces hallucination and scope creep.

Reads the plan and generates the full file contents for every listed change.

  • Input: the plan, the current contents of any files being modified, team memory (conventions and constraints heavily weighted)
  • Output: a CodeChange with full new file content for each entry
  • For modified files: current contents are included in the prompt so the LLM can preserve unrelated code in the same file
  • Validation: Cascade checks the generated changes against the plan. If the LLM tried to create a file the plan did not list, or used a different action than planned, the run fails before anything touches disk.

Writes the generated files to disk. Pure I/O, no LLM involved.

  • Creates, modifies, or deletes each file per the CodeChange
  • Tracks which files were touched (for the next stages)
  • If apply fails partway through, the branch is left in a dirty state for you to inspect

Best-effort dependency install. Skipped if the language’s install tool is not available.

  • Python: pip install -e . if pyproject.toml is present
  • TypeScript / JavaScript: npm install, pnpm install, or yarn install (auto-detected)
  • Go: go mod tidy
  • Rust: cargo build (just to populate dependencies)
  • Other languages: skipped
  • Install failure does not abort the pipeline. Tests just run with whatever is in the env.

Runs the project’s test command in a subprocess and captures the result.

  • Auto-detects the command from the language profile (e.g., pytest, vitest, go test ./..., cargo test)
  • Overridable via test_command in cascade.yaml
  • Captures: pass/fail, exit code, full stdout/stderr, duration
  • A test failure does not abort the commit. The failure is recorded in the PR body so a human can see it and decide whether to fix or revert.

Stages the touched files and creates a commit with a conventional-style message.

feat: Add cursor pagination to /api/users
Story: story-prompt-20260925-103045
Meeting: cascade-try-builtin
Files changed:
- modify: src/api/users.py
- create: tests/test_users_pagination.py
Generated by Cascade. Human review required before merge.

Creates a feature branch (cascade/{story-id}/{slugified-title}), pushes it to the remote.

  • Branch name encodes the story so you can correlate runs to PRs later
  • Push uses the VCS provider you configured (GitHub, GitLab, Bitbucket, or Azure DevOps)

Opens a pull request via the VCS provider’s API.

  • Title: conventional commit style, capped at 140 chars
  • Body: the story, the acceptance criteria, the list of files changed, the test result, and a Cascade attribution
  • Reviewers: none auto-assigned; configure that in your VCS if you want it
  • Labels: Cascade can be configured to add labels like agent or needs-review; off by default

Two gates, both enforced:

  1. Story review (stage 3) before any code is generated. You see what Cascade plans to build and can edit or reject before it touches your repo. This is the cheapest place to course-correct.
  2. PR review (after stage 11) before any code is merged. Cascade never has merge permissions. A human always approves the final change.

Every LLM call across every stage receives a budgeted slice of team memory as grounding context:

StageFiles heavily weighted
Extractglossary.md (domain terms)
Plandecisions.md, prior-work.md, constraints.md
Codeconventions.md, constraints.md

The total memory sent per call is capped at 20,000 characters by default. Files exceeding the cap are truncated proportionally. See team memory for how to populate the files for best results.

Every LLM call surfaces its estimated cost in the CLI:

cost: $0.12 (8,234 in / 2,156 out tokens, anthropic/claude-opus-4-7)

For multi-story builds, a session total prints at the end. cascade build --max-cost 5.00 aborts between stories if cumulative cost would exceed your budget.

During long runs, Cascade prints animated per-stage spinners with checkmarks as each one completes. Pass -q / --quiet to suppress the animation (useful in CI).

The pipeline is fail-fast. Any error in any stage aborts the rest and surfaces a structured error message with:

  • What stage failed
  • What went wrong
  • A specific suggestion for how to fix it
  • A link to deeper docs if relevant

The branch and any applied files are left in place so you can inspect them. Run git status and git diff to see what Cascade did before the failure.

  • Team memory: the grounding layer that makes every stage output better
  • Security model: what the agent can and cannot do
  • Languages: per-language behavior for test commands, dependency install, and file layout
  • CLI reference: every command and flag