Skip to main content

Command Palette

Search for a command to run...

Split Value, Not Teams: How AI Can Turn Big Stories into Shippable Slices

Updated
7 min read
Split Value, Not Teams: How AI Can Turn Big Stories into Shippable Slices

Most backlogs are full of pseudo-stories—tickets like “Frontend login page,” “Create user table,” or “Backend API for auth.” They feel tidy, but they aren’t stories. They’re fragments that force waterfall handoffs inside a sprint, inflate WIP, and make it impossible to answer: what user outcome will be done this week?

This post lays out a practical way to spot oversized, fuzzy stories and use AI to split them into vertical, end-to-end slicesthat ship value—without turning your board into a graveyard of chores. The examples assume Battra, but the approach applies anywhere.


The Smell Test: “Big Story” vs. “Non-Story”

A real story should pass a simple 5-point sniff test:

  1. User-observable (demoable behind a flag counts).

  2. Valuable (changes a user behavior or outcome).

  3. Independent enough to complete without cross-team choreography.

  4. Small (1–3 days of elapsed work for a pair/small swarm).

  5. Testable (clear acceptance criteria).

Fail any two of these and you’re likely holding a big story or a non-story (role/tech slice).

Common anti-patterns:

  • Team splits: “Frontend,” “Backend,” “QA,” “Docs.”

  • Architecture splits: “DB migration,” “API endpoint,” “React component.”

  • Burndown theater: Lots of movement on tiny tasks, nothing demoable.


The Reframe: Split by Outcomes, Not Org Charts

Good splits are thin, end-to-end verticals. Think “walking skeleton”: deliver one narrow path that works from UI through storage and back, even if it’s rough and behind a flag. Then add more narrow paths.

Useful axes for splitting by outcome:

  • Persona / Role: admin vs. employee vs. contractor

  • Trigger / Context: web vs. mobile, first-time vs. returning

  • Variant of the capability: Google login vs. email/password vs. magic link

  • Workflow stage: happy path → error handling → edge cases

  • Risk first: de-risk an unknown (external API, compliance) as a separate slice

  • Data slice: one product line/region/currency before all

  • Rollout stage: internal users → beta cohort → general availability

Pick one or two axes to start. If a split doesn’t produce something demoable, it’s not a story—merge it back.


Example: The “Login” Mega-Story

Big story: “As a user, I want to log in so I can perform privileged activities.”

Bad (non-story) splits

  • “Create login page (frontend)”

  • “Auth endpoints (backend)”

  • “User table (database)”

  • “QA test login”

Each depends on the others; none is user-complete.

Better, vertical splits

  1. Google login (internal users only, behind a flag)
    AC: Internal users can authenticate with Google; success leads to a basic “Timesheet” page; failures show friendly error; event logged; telemetry visible.

  2. Email/password happy path (flagged)
    AC: Existing seeded users can log in; can log out; p95 < 200ms; audit log writes.

  3. Magic link (edge case + a11y copy)
    AC: One-time links expire in 15 minutes; keyboard-only flow passes.

  4. Error handling & rate limit
    AC: Lockout after N attempts with reset instructions; 429 surfaces; alarm on brute-force signal.

  5. Rollout to external cohort
    AC: Feature flag to 10% external users; kill switch; docs updated.

Each slice is demoable, testable, and safe to ship behind a flag.


Why Humans Struggle (and Where AI Helps)

Humans tend to mirror their org chart: “I do backend, so the story is an API.” Under pressure, we also over-optimize for motion (lots of tickets moving) instead of flow (value finishing).

AI can help by:

  • Recognizing non-story language (“create,” “set up,” “implement”) vs. outcome language (“user can…”).

  • Extracting persona, capability, value from a big story and proposing vertical splits along meaningful axes.

  • Generating acceptance criteria and test notes per slice.

  • Flagging risk hotspots (e.g., “SSO provider unknown; propose capability probe slice”).

  • Replacing the big story with an epic + linked end-to-end stories (or a single story with a checklist if that’s truly better).


What AI-Assisted Splitting Looks Like (in practice)

Input (the “too big” story):

“As a user, I want to log in so that I can perform privileged activities.”

AI proposes (example output):

Epic: User Authentication (MVP)
Goal: Enable secure authentication for employee timesheets with staged rollout.

Stories:

  1. Login with Google (internal, flagged)

    • AC: Internal user can authenticate via Google; redirected to Timesheet; errors are actionable; event auth.login.google.success/failure recorded; feature off by default.

    • Test notes: Seed one internal Google account; synthetic telemetry check.

  2. Timesheet access gated by session

    • AC: Timesheet page redirects to login if session missing/expired; logout works; session TTL 24h; audit trail recorded.

    • Test notes: Simulate expiry; verify redirect chain.

  3. Email/Password happy path (seeded users only, flagged)

    • AC: Seeded users authenticate; password policy enforced; basic lockout after N failures (no UX polish yet).

    • Test notes: Brute force simulator; ensure alarms.

  4. Error handling & a11y

    • AC: Tokens expire gracefully; screen reader labels present; keyboard flow passes.

    • Test notes: VoiceOver/NVDA pass.

  5. Rollout control

    • AC: Gradual enablement (0% → 10% → 100% for internal; beta cohort for external); kill switch; on-call playbook.

AI also suggests:

  • Reject these non-stories if created: “Create user table,” “Implement OAuth client” (fold them into the vertical slices).

  • Risk slice: “SSO capability probe” if identity provider is unknown.

This is the difference between shipping and assembling parts.


Heuristics the AI Uses (transparent rules you can teach)

  1. INVEST guardrails: If a candidate slice isn’t Independent, Valuable, or Testable, merge it back.

  2. Demo rule: Every slice must be demoable to a real or fake persona (internal is fine) behind a flag.

  3. Two-axis max: Prefer splitting on one primary axis (e.g., capability variant) and one safety axis (e.g., rollout stage). More axes → explosion of tickets.

  4. Risk-first slice allowed: If something is unknown (3rd-party API limits, compliance, perf), create a capability probe slice whose output is knowledge + a thin working path.

  5. Checklist, not child tickets: Sub-activities (migrations, copy polish) become PR/story checklists unless they genuinely require separate validation.


Replace Child Tickets with Smart Checklists

Keep flow tight by embedding detail inside the story/PR:

### Acceptance Criteria
- [ ] Internal users can log in with Google and reach the Timesheet
- [ ] Failure states provide actionable guidance (no vague “Error”)
- [ ] Telemetry: auth.login.google.{success,failure} captured
- [ ] Feature flag default OFF; kill switch documented

### Done Definition
- [ ] Unit + integration tests for happy path
- [ ] Dashboard panel shows success/failure rates
- [ ] Alert on 5-minute failure rate > 5%
- [ ] a11y: focus order and labels verified

Export these checklists to your audit system after merge if you need paper trails. Don’t fragment development to create them.


Prompt-First Workflow (so splitting happens at the right moment)

  1. Declare intent once (CLI, editor, or chat):

     Split the big story "As a user, I want to log in so I can access timesheets"
    
  2. AI proposes an epic and 3–5 vertical slices with AC, risk notes, and rollout.

  3. You prune (merge or delete) and accept the slices.

  4. Tool replaces the big story with the epic and linked stories, seeds PR templates, and sets flags.

  5. Board policy: Only stories move columns. Checklists live in the PR; no role-sliced tickets.

This keeps humans in control of intent, with the AI doing the clerical heavy lifting.


Metrics to Prove It Worked

Expect to see improvements in:

  • Lead time per story (request → production behind a flag)

  • Review latency (PR open → first review)

  • Aging WIP (stories > N days in progress)

  • Change failure rate (thin, end-to-end changes tend to fail less)

  • % of stories demoable per sprint (no more “done backend” without a path)

Stop counting subtasks; start measuring flow.


Pitfalls (and how the AI avoids them)

  • Explosion of tiny stories: Use the two-axis max heuristic and combine micro-splits into a single outcome slice.

  • Hidden dependencies: The AI tags slices with “shared concerns” (e.g., session store, email provider) and warns if a slice can’t be demoed without them.

  • Gold-plating early: Early slices bias to happy path behind a flag; polish and a11y get their own slice once the path exists.

  • Reintroducing team splits: The bot gently blocks story titles that start with “Backend/Frontend/QA” and suggests a vertical rewrite.


A 30-Minute Starter Plan

  1. Add the sniff test to your story template (“Demoable? Valuable? Independent? Small? Testable?”).

  2. Pilot AI-assisted splitting on your next obviously-big story (e.g., “Login,” “Invoicing,” “Search”). Accept 3–5 slices max.

  3. Change board policy: Only stories move columns; checklists live in PRs; ban role-sliced tickets.

  4. Feature-flag everything so thin slices can ship safely.

  5. Measure two sprints of lead time and review latency. Share the deltas at retro.


TL;DR

Most “splitting” turns stories into role-based chores that don’t ship value. Teach your system (and your team) to split by user-visible outcomes along a couple of sensible axes—persona, capability variant, workflow stage, risk—so each slice is end-to-end, demoable, and testable. Let AI do the mechanical work: detect non-stories, propose vertical slices with AC, replace the mega-story with an epic + linked stories, and keep the board focused on flow.

That’s how you make splitting useful: fewer chores, more value, shipped sooner.