Skip to content

Team memory

Team memory is the difference between “another AI agent” and “an AI agent that gets your codebase.”

When a developer uses Copilot, Cursor, or Claude individually, each session is isolated. The architectural decisions made in last week’s meeting aren’t in anyone’s AI context. Coding conventions discovered by one dev aren’t shared with anyone else’s AI. The “why we chose X over Y” reasoning is forgotten the moment the conversation ends.

Teams collaborate in human channels (meetings, Slack, Teams). But the AI side of the modern dev workflow is single-player. Cascade fixes that with team memory.

When you run cascade init, Cascade creates a team-memory/ directory with five starter files:

team-memory/
├── README.md
├── conventions.md ← coding style, file layout, naming
├── decisions.md ← architectural decisions and why
├── glossary.md ← domain terms unique to your business
├── constraints.md ← performance, security, compliance
└── prior-work.md ← summaries of recently shipped stories

Every Cascade stage reads relevant excerpts from these files as grounding context for its LLM calls. So:

  • The story extractor knows your domain glossary and won’t invent new names for existing concepts.
  • The planner knows what’s already been built and avoids duplicating it.
  • The coder knows your conventions and produces code that fits your style, not generic textbook style.
  • The tester knows your test patterns and produces tests that match.

Be specific. Examples beat abstractions.

- Python: snake_case for functions, PascalCase for classes
- File layout: API routes in src/api/routes/, models in src/models/
- Database: singular table names (user, not users)
- Error handling: raise specific exceptions, never bare except
- No global mutable state
- Type hints required on all public functions

ADR-style log. What you chose, why, and what alternatives you rejected.

## [2026-01-15] Postgres over MongoDB
We needed schema enforcement and relational guarantees. Postgres
won. All new persistence uses SQLAlchemy + Alembic migrations.
We don't introduce new MongoDB anywhere.

Terms unique to your product or codebase.

**Workspace**: A user's top-level container. Each user can have many
workspaces. Not a folder, not a UI panel.
**Run**: One execution of a workflow. Has start time, end time, status,
and produces artifacts.

Non-functional requirements that affect every design decision.

## Performance
- API response time p99 under 200ms
- Max 5 DB queries per request
- No N+1 queries
## Security
- All API endpoints require auth except /health
- No PII in logs
- All secrets via env vars, never in code

Brief summaries of shipped work so Cascade doesn’t duplicate it.

## [2026-04-22] Cursor pagination on /api/users
Added cursor-based pagination with ?limit and ?after.
Pattern lives in src/api/pagination.py. Extending to other
endpoints? Use the same pattern. Don't introduce offset-based.

Aim for at least 50 substantive lines across all five files before relying on Cascade for real work. Below that threshold, the LLM is mostly working from generic best practices. Above it, you’ll see the difference.

In v0.1, you maintain it manually. After every architecture meeting, add to decisions.md. After every PR merges, summarize in prior-work.md. After every glossary term emerges, add it.

In v0.2 and beyond, Cascade will suggest updates automatically based on the meetings it processes and the PRs that merge. That’s on the roadmap; for now, treat team memory as living documentation that you tend to.

Naively dumping every team-memory file into every LLM call would bloat prompts, slow responses, raise costs, and dilute the model’s attention across irrelevant material. Cascade avoids that by treating team memory as a bounded, ranked context budget rather than a raw blob.

The rules:

  • Hard character budget. Team memory is capped at 20,000 characters per LLM call by default (roughly 5,000 tokens, a small fraction of any modern model’s context window). Configurable via memory.max_chars_per_call in cascade.yaml.
  • Per-file proportional truncation. When your files exceed the budget, each file is truncated to its proportional share so one bloated file can’t eat the whole window.
  • Stage-aware selection. Different pipeline stages prioritize different files. The extractor weights glossary.md; the planner reaches for decisions.md and prior-work.md; the coder leans on conventions.md and constraints.md. No stage receives the entire library.
  • Empty files are skipped. A starter template with no real content costs zero context.
  • Structured grounding, not raw paste. Cascade sends the files as a labeled grounding block (headings, bullets, ADR entries) so the model sees signal, not markdown formatting noise.

The result: team memory grows over time without prompts growing with it. A team with 200 KB of accumulated decisions and conventions still produces a tight, focused prompt on every call.