The Guild

Every change should answer one question: "Can I ship this with confidence?"

This is the system that answers it. Not a phase at the end, not a checkbox before release -- a continuous set of practices woven into how you build. These practices were good before AI and they'll be good with AI. If your team uses PRDs, design docs, or technical specs, those are structured inputs that help you execute this process faster. The thinking still matters. The tooling just got quicker.

Small batches, fast feedback, low risk.

Big plans shipped all at once is where risk hides. Break the work down, ship a slice, get feedback. When the batch is small, every practice here just works -- TDD loops stay tight, CI catches problems early, and when something breaks you know exactly what changed.

If you want to explore why batch size matters, check out Value Stream World -- an interactive simulation of how work flows through systems.

Core Beliefs

Build quality in.

Testing starts before code -- the moment you think about how the system should behave. Every commit should tell you something about quality.

Layered confidence.

No single layer catches everything. Unit, service integration, end-to-end, and production monitoring each cover blind spots the others miss.

Automate with intent.

Automate what adds reliability and speed. If a test doesn't protect a real scenario, delete it.

Test data is infrastructure.

Flaky data produces flaky results. Treat test data lifecycle -- seeding, isolation, cleanup -- as seriously as the tests themselves.

Fast feedback, always.

If a test takes too long to run, it won't get run. Optimize for tight loops: seconds for unit, minutes for integration, single-digit minutes for E2E.

The Layers

Defence in depth. Each layer adds confidence at a different speed and scope. No single layer catches everything.

Layer	When	Speed
Unit Logic and functions in isolation. Everything else is simulated.	Every commit	Seconds
Service Integration Your code + real dependencies (DB, HTTP, serialization). External systems simulated.	Every commit	Seconds
Contract Each side validates the API agreement independently. The other service doesn't need to be running.	Every build (CI)Whether that's a pull request check on a feature branch or a pipeline run on your commit to main	Minutes
Security Dependency vulnerabilities, insecure patterns, misconfigurations.	Every build (CI)Whether that's a pull request check on a feature branch or a pipeline run on your commit to main	Minutes
End-to-end User flows. Full stack, real UI, real infrastructure. Nothing simulated.	Post-deploy	Minutes
Exploratory Edge cases, UX issues, things automation misses.	Pre-release	Manual

Approaches

There are many approaches to testing -- they're techniques, not ideologies.

Test-Driven Development (TDD)

Write a test first, watch it fail, make it pass. TDD is a workflow, not a test type -- the test lives at whatever layer fits the work.

Behavior-Driven Development (BDD)

Describe behavior in plain language (Given/When/Then), then automate it. The scenarios are the spec. Gherkin is just a format -- the test runner is tooling.

Testing in the Delivery Flow

Testing is woven into delivery, not bolted on at the end.

Pre-commit

Unit and component tests run on commit or pre-push
Linting, formatting, and type checks enforce consistency
Local test data via lightweight fixtures

Build (CI)

Service integration tests validate module boundaries
Test environment provisioned automatically
Failures gate the pipeline. No exceptions.

Pre-release (CD)

E2E suites run against staging
Regression suites on critical user workflows
Synthetic data seeded and cleaned up

Post-deploy

Smoke tests confirm availability
Monitoring validates real traffic
Gaps feed back into test coverage

Test Data

Every test run starts from a known state. No shared mutable data between runs.

Seed before, clean after.

Automated seed scripts provision baseline data. Teardown scripts or ephemeral environments destroy it post-test.

Isolate by default.

Each test run uses unique identifiers or isolated tenants. No collisions between parallel runs.

Synthetic only.

All test data is generated or anonymized. Production data never leaves production.

Environment parity.

Local dev, CI, and staging all use the same seeding approach -- only the backend differs (in-memory vs hosted).

What to Measure

Measure what changes behavior, not what fills dashboards.

Metric	Why it matters
Defect escape rate	Are bugs reaching production? Trend should go down.
Test cycle time	How fast is the feedback loop? If it's slow, developers will skip it.
Release pass rate	What percentage of releases clear all quality gates on first attempt?
Automated vs manual ratio	Are you still doing manually what a machine could validate?
Flakiness rate	Flaky tests erode trust. Track and fix or delete them.

Ownership

The team that writes the code owns its quality. There's no QA wall to throw things over, no LLM that takes responsibility for what gets merged.

Automated tests are the first line of defense.

They hold the standard for everything you can codify -- regressions, contracts, critical paths. They run on every commit, they gate the pipeline, they don't get tired.

Pairing is live code review.

Whether you're pairing with another engineer or using an LLM as a collaborator, you get feedback while the context is fresh. It's the fastest human feedback loop you have.

Code review catches what automation hasn't learned yet.

Design decisions, readability, architectural drift, edge cases nobody thought to test -- this is the human catch point for things that aren't yet codified into automated checks.

Retros close the long loop.

Days or weeks after the code ships, the team reflects on what broke, what was missed, and what to automate next. Slow feedback, but it's where patterns emerge.

However you work -- solo, pairing, ensemble, with an LLM -- you're still responsible for what lands. Review test health at retros. If a test suite is ignored, it's worse than having no tests -- it's false confidence.

What tests actually look like

Concrete examples of each testing layer. Written in TypeScript/JS, but the patterns apply to any language. The point isn't to teach the syntax -- it's to make the abstract concrete.

Unit Test TypeScript

Tests a single function in isolation. No database, no network, no filesystem. If it needs external dependencies, they're mocked.

// calculateDiscount.test.ts
import { calculateDiscount } from './calculateDiscount';

describe('calculateDiscount', () => {
  it('gives 20% discount for 5+ year members', () => {
    expect(calculateDiscount(100, 5)).toBe(20);
  });

  it('gives 10% discount for 2-4 year members', () => {
    expect(calculateDiscount(100, 3)).toBe(10);
  });

  it('gives no discount for new members', () => {
    expect(calculateDiscount(100, 1)).toBe(0);
  });
});

$ npx jest calculateDiscount.test.ts

PASS  calculateDiscount.test.ts
  calculateDiscount
    ✓ gives 20% discount for 5+ year members (2ms)
    ✓ gives 10% discount for 2-4 year members (1ms)
    ✓ gives no discount for new members

Tests:  3 passed, 3 total
Time:   0.4s

Unit Test (TDD Style) TypeScript

Step 1 of 4 — Write the test first

// slugify.test.ts — Write the test FIRST. The function doesn't exist yet.
import { slugify } from './slugify';

describe('slugify', () => {
  it('converts spaces to hyphens', () => {
    expect(slugify('hello world')).toBe('hello-world');
  });

  it('lowercases everything', () => {
    expect(slugify('Hello World')).toBe('hello-world');
  });

  it('strips non-alphanumeric characters', () => {
    expect(slugify('hello, world!')).toBe('hello-world');
  });
});

$ _

Service Integration Test TypeScript

Your code wired to real dependencies -- a real database, real HTTP handlers, real serialization. Tests that the seams between your code and its dependencies work.

// users.integration.test.ts
import { app } from './app';
import { db } from './database';
import request from 'supertest';

beforeEach(async () => {
  await db.migrate.latest();
  await db.seed.run();
});

afterEach(async () => {
  await db('users').truncate();
});

describe('POST /users', () => {
  it('creates a user and persists to database', async () => {
    const res = await request(app)
      .post('/users')
      .send({ name: 'Alice', email: 'alice@example.com' });

    expect(res.status).toBe(201);
    expect(res.body.name).toBe('Alice');

    const row = await db('users')
      .where({ email: 'alice@example.com' }).first();
    expect(row).toBeDefined();
    expect(row.name).toBe('Alice');
  });

  it('rejects duplicate emails', async () => {
    await request(app)
      .post('/users')
      .send({ name: 'Alice', email: 'alice@example.com' });

    const res = await request(app)
      .post('/users')
      .send({ name: 'Bob', email: 'alice@example.com' });

    expect(res.status).toBe(409);
  });
});

$ npx jest --testPathPattern=integration

  Setting up test database...
  Running migrations...
  Seeding test data...

PASS  users.integration.test.ts
  POST /users
    ✓ creates a user and persists to database (45ms)
    ✓ rejects duplicate emails (12ms)

  Cleaning up test database...

Tests:  2 passed, 2 total
Time:   1.2s

Contract Test TypeScript

Each service validates against a shared agreement of what the API looks like. Neither service needs the other running.

// Provider side: user-provider.contract.test.ts
import { Verifier } from '@pact-foundation/pact';

describe('User API provider contract', () => {
  it('satisfies the consumer contract', async () => {
    await new Verifier({
      providerBaseUrl: 'http://localhost:3000',
      pactUrls: ['./pacts/order-service-user-service.json'],
    }).verifyProvider();
  });
});

// Consumer side: user-consumer.contract.test.ts
import { Pact } from '@pact-foundation/pact';

const provider = new Pact({
  consumer: 'OrderService',
  provider: 'UserService',
});

describe('User API consumer contract', () => {
  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());

  it('expects user response with id, name, email', async () => {
    await provider.addInteraction({
      state: 'a user with id 1 exists',
      uponReceiving: 'a request for user 1',
      withRequest: { method: 'GET', path: '/users/1' },
      willRespondWith: {
        status: 200,
        body: { id: 1, name: 'Alice',
          email: 'alice@example.com' },
      },
    });

    const res = await fetch(
      `${provider.mockService.baseUrl}/users/1`);
    const user = await res.json();

    expect(user.id).toBe(1);
    expect(user.name).toBeDefined();
    expect(user.email).toBeDefined();
  });
});

$ npx jest --testPathPattern=contract

  Pact verification started

PASS  user-provider.contract.test.ts
  User API provider contract
    ✓ satisfies the consumer contract (120ms)

PASS  user-consumer.contract.test.ts
  User API consumer contract
    ✓ expects user response with id, name, email (85ms)

  Pact file written: pacts/order-service-user-service.json

Tests:  2 passed, 2 total
Time:   0.8s

BDD / End-to-End Test Gherkin + TypeScript

Behavior described in plain language, automated through the real UI. The Gherkin scenarios are the spec.

Feature: Checkout

  Scenario: Registered user completes purchase
    Given I am logged in as "alice@example.com"
    And I have 2 items in my cart
    When I proceed to checkout
    And I confirm my payment method
    Then I should see "Order confirmed"
    And I should receive a confirmation email

$ npx cucumber-js --require features/steps

Feature: Checkout

  Scenario: Registered user completes purchase
    ✓ Given I am logged in as "alice@example.com"
    ✓ And I have 2 items in my cart
    ✓ When I proceed to checkout
    ✓ And I confirm my payment method
    ✓ Then I should see "Order confirmed"
    ✓ And I should receive a confirmation email

1 scenario (1 passed)
6 steps (6 passed)
Time:   4.2s

E2E Test (Without BDD) TypeScript

You don't need Gherkin to write E2E tests. Playwright or Cypress can test full workflows directly.

// tests/e2e/checkout.spec.ts
import { test, expect } from '@playwright/test';

test('registered user can complete a purchase', async ({ page }) => {
  // Login
  await page.goto('/login');
  await page.fill('[name="email"]', 'alice@example.com');
  await page.fill('[name="password"]', 'testpassword');
  await page.click('button[type="submit"]');

  // Add items
  await page.goto('/products/1');
  await page.click('[data-testid="add-to-cart"]');

  // Checkout
  await page.goto('/cart');
  await page.click('[data-testid="checkout"]');
  await page.click('[data-testid="confirm-payment"]');

  // Verify
  await expect(page.locator('body'))
    .toContainText('Order confirmed');
});

$ npx playwright test checkout.spec.ts

Running 1 test using 1 worker

  ✓ registered user can complete a purchase (3.8s)

  1 passed (4.1s)

Generate with AI

Upload a report

Paste JSON

Persona Library

Core Beliefs

The Layers

Approaches

Testing in the Delivery Flow

Pre-commit

Build (CI)

Pre-release (CD)

Post-deploy

Test Data

What to Measure

Ownership

The Guild

Generate with AI

Upload a report

Paste JSON

Persona Library

Core Beliefs

The Layers

Approaches

Testing in the Delivery Flow

Pre-commit

Build (CI)

Pre-release (CD)

Post-deploy

Test Data

What to Measure

Ownership

Generate your audit report with AI