Andrew Khoury
Andrew Khoury
Engineering leader & maker

The Guild

Assemble your crew · Audit your code · Get the report

Audit your codebase with AI — your LLM, your data, your report.

The Guild gives you expert auditor personas — security, UX, performance, code quality, testing — each with calibrated prompts and a distinct voice. Pick a crew, feed the prompt to your LLM, and get back a structured JSON report. Everything runs in your browser. No backend, no database, no data leaves your machine. The JSON is yours to keep and use however you want.

53 PERSONAS · 6 WORLDS · 6 FORMATS

1
Generate or load your audit report

Generate with AI

Pick a crew, get a prompt, run it in your LLM

📄

Upload a report

Drop or browse for a JSON report file

📋

Paste JSON

Paste the raw JSON output from your LLM

2
Choose your experience

Persona Library

Every change should answer one question: "Can I ship this with confidence?"

This is the system that answers it. Not a phase at the end, not a checkbox before release -- a continuous set of practices woven into how you build. These practices were good before AI and they'll be good with AI. If your team uses PRDs, design docs, or technical specs, those are structured inputs that help you execute this process faster. The thinking still matters. The tooling just got quicker.
Small batches, fast feedback, low risk.

Big plans shipped all at once is where risk hides. Break the work down, ship a slice, get feedback. When the batch is small, every practice here just works -- TDD loops stay tight, CI catches problems early, and when something breaks you know exactly what changed.

If you want to explore why batch size matters, check out Value Stream World -- an interactive simulation of how work flows through systems.

Core Beliefs

Build quality in.
Testing starts before code -- the moment you think about how the system should behave. Every commit should tell you something about quality.
Layered confidence.
No single layer catches everything. Unit, service integration, end-to-end, and production monitoring each cover blind spots the others miss.
Automate with intent.
Automate what adds reliability and speed. If a test doesn't protect a real scenario, delete it.
Test data is infrastructure.
Flaky data produces flaky results. Treat test data lifecycle -- seeding, isolation, cleanup -- as seriously as the tests themselves.
Fast feedback, always.
If a test takes too long to run, it won't get run. Optimize for tight loops: seconds for unit, minutes for integration, single-digit minutes for E2E.

The Layers

Defence in depth. Each layer adds confidence at a different speed and scope. No single layer catches everything.

LayerWhenSpeed
Unit
Logic and functions in isolation. Everything else is simulated.
Every commitSeconds
Service Integration
Your code + real dependencies (DB, HTTP, serialization). External systems simulated.
Every commitSeconds
Contract
Each side validates the API agreement independently. The other service doesn't need to be running.
Every build (CI)Whether that's a pull request check on a feature branch or a pipeline run on your commit to mainMinutes
Security
Dependency vulnerabilities, insecure patterns, misconfigurations.
Every build (CI)Whether that's a pull request check on a feature branch or a pipeline run on your commit to mainMinutes
End-to-end
User flows. Full stack, real UI, real infrastructure. Nothing simulated.
Post-deployMinutes
Exploratory
Edge cases, UX issues, things automation misses.
Pre-releaseManual

Approaches

There are many approaches to testing -- they're techniques, not ideologies.

Test-Driven Development (TDD)
Write a test first, watch it fail, make it pass. TDD is a workflow, not a test type -- the test lives at whatever layer fits the work.
Behavior-Driven Development (BDD)
Describe behavior in plain language (Given/When/Then), then automate it. The scenarios are the spec. Gherkin is just a format -- the test runner is tooling.

Testing in the Delivery Flow

Testing is woven into delivery, not bolted on at the end.

Pre-commit

  • Unit and component tests run on commit or pre-push
  • Linting, formatting, and type checks enforce consistency
  • Local test data via lightweight fixtures

Build (CI)

  • Service integration tests validate module boundaries
  • Test environment provisioned automatically
  • Failures gate the pipeline. No exceptions.

Pre-release (CD)

  • E2E suites run against staging
  • Regression suites on critical user workflows
  • Synthetic data seeded and cleaned up

Post-deploy

  • Smoke tests confirm availability
  • Monitoring validates real traffic
  • Gaps feed back into test coverage

Test Data

Every test run starts from a known state. No shared mutable data between runs.

Seed before, clean after.
Automated seed scripts provision baseline data. Teardown scripts or ephemeral environments destroy it post-test.
Isolate by default.
Each test run uses unique identifiers or isolated tenants. No collisions between parallel runs.
Synthetic only.
All test data is generated or anonymized. Production data never leaves production.
Environment parity.
Local dev, CI, and staging all use the same seeding approach -- only the backend differs (in-memory vs hosted).

What to Measure

Measure what changes behavior, not what fills dashboards.

MetricWhy it matters
Defect escape rateAre bugs reaching production? Trend should go down.
Test cycle timeHow fast is the feedback loop? If it's slow, developers will skip it.
Release pass rateWhat percentage of releases clear all quality gates on first attempt?
Automated vs manual ratioAre you still doing manually what a machine could validate?
Flakiness rateFlaky tests erode trust. Track and fix or delete them.

Ownership

The team that writes the code owns its quality. There's no QA wall to throw things over, no LLM that takes responsibility for what gets merged.

Automated tests are the first line of defense.
They hold the standard for everything you can codify -- regressions, contracts, critical paths. They run on every commit, they gate the pipeline, they don't get tired.
Pairing is live code review.
Whether you're pairing with another engineer or using an LLM as a collaborator, you get feedback while the context is fresh. It's the fastest human feedback loop you have.
Code review catches what automation hasn't learned yet.
Design decisions, readability, architectural drift, edge cases nobody thought to test -- this is the human catch point for things that aren't yet codified into automated checks.
Retros close the long loop.
Days or weeks after the code ships, the team reflects on what broke, what was missed, and what to automate next. Slow feedback, but it's where patterns emerge.

However you work -- solo, pairing, ensemble, with an LLM -- you're still responsible for what lands. Review test health at retros. If a test suite is ignored, it's worse than having no tests -- it's false confidence.

What tests actually look like

Concrete examples of each testing layer. Written in TypeScript/JS, but the patterns apply to any language. The point isn't to teach the syntax -- it's to make the abstract concrete.

Unit Test TypeScript
Tests a single function in isolation. No database, no network, no filesystem. If it needs external dependencies, they're mocked.
// calculateDiscount.test.ts
import { calculateDiscount } from './calculateDiscount';

describe('calculateDiscount', () => {
  it('gives 20% discount for 5+ year members', () => {
    expect(calculateDiscount(100, 5)).toBe(20);
  });

  it('gives 10% discount for 2-4 year members', () => {
    expect(calculateDiscount(100, 3)).toBe(10);
  });

  it('gives no discount for new members', () => {
    expect(calculateDiscount(100, 1)).toBe(0);
  });
});
$ npx jest calculateDiscount.test.ts

PASS  calculateDiscount.test.ts
  calculateDiscount
     gives 20% discount for 5+ year members (2ms)
     gives 10% discount for 2-4 year members (1ms)
     gives no discount for new members

Tests:  3 passed, 3 total
Time:   0.4s
Unit Test (TDD Style) TypeScript
Step 1 of 4 — Write the test first
// slugify.test.ts — Write the test FIRST. The function doesn't exist yet.
import { slugify } from './slugify';

describe('slugify', () => {
  it('converts spaces to hyphens', () => {
    expect(slugify('hello world')).toBe('hello-world');
  });

  it('lowercases everything', () => {
    expect(slugify('Hello World')).toBe('hello-world');
  });

  it('strips non-alphanumeric characters', () => {
    expect(slugify('hello, world!')).toBe('hello-world');
  });
});
$ _
Service Integration Test TypeScript
Your code wired to real dependencies -- a real database, real HTTP handlers, real serialization. Tests that the seams between your code and its dependencies work.
// users.integration.test.ts
import { app } from './app';
import { db } from './database';
import request from 'supertest';

beforeEach(async () => {
  await db.migrate.latest();
  await db.seed.run();
});

afterEach(async () => {
  await db('users').truncate();
});

describe('POST /users', () => {
  it('creates a user and persists to database', async () => {
    const res = await request(app)
      .post('/users')
      .send({ name: 'Alice', email: 'alice@example.com' });

    expect(res.status).toBe(201);
    expect(res.body.name).toBe('Alice');

    const row = await db('users')
      .where({ email: 'alice@example.com' }).first();
    expect(row).toBeDefined();
    expect(row.name).toBe('Alice');
  });

  it('rejects duplicate emails', async () => {
    await request(app)
      .post('/users')
      .send({ name: 'Alice', email: 'alice@example.com' });

    const res = await request(app)
      .post('/users')
      .send({ name: 'Bob', email: 'alice@example.com' });

    expect(res.status).toBe(409);
  });
});
$ npx jest --testPathPattern=integration

  Setting up test database...
  Running migrations...
  Seeding test data...

PASS  users.integration.test.ts
  POST /users
     creates a user and persists to database (45ms)
     rejects duplicate emails (12ms)

  Cleaning up test database...

Tests:  2 passed, 2 total
Time:   1.2s
Contract Test TypeScript
Each service validates against a shared agreement of what the API looks like. Neither service needs the other running.
// Provider side: user-provider.contract.test.ts
import { Verifier } from '@pact-foundation/pact';

describe('User API provider contract', () => {
  it('satisfies the consumer contract', async () => {
    await new Verifier({
      providerBaseUrl: 'http://localhost:3000',
      pactUrls: ['./pacts/order-service-user-service.json'],
    }).verifyProvider();
  });
});

// Consumer side: user-consumer.contract.test.ts
import { Pact } from '@pact-foundation/pact';

const provider = new Pact({
  consumer: 'OrderService',
  provider: 'UserService',
});

describe('User API consumer contract', () => {
  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());

  it('expects user response with id, name, email', async () => {
    await provider.addInteraction({
      state: 'a user with id 1 exists',
      uponReceiving: 'a request for user 1',
      withRequest: { method: 'GET', path: '/users/1' },
      willRespondWith: {
        status: 200,
        body: { id: 1, name: 'Alice',
          email: 'alice@example.com' },
      },
    });

    const res = await fetch(
      `${provider.mockService.baseUrl}/users/1`);
    const user = await res.json();

    expect(user.id).toBe(1);
    expect(user.name).toBeDefined();
    expect(user.email).toBeDefined();
  });
});
$ npx jest --testPathPattern=contract

  Pact verification started

PASS  user-provider.contract.test.ts
  User API provider contract
     satisfies the consumer contract (120ms)

PASS  user-consumer.contract.test.ts
  User API consumer contract
     expects user response with id, name, email (85ms)

  Pact file written: pacts/order-service-user-service.json

Tests:  2 passed, 2 total
Time:   0.8s
BDD / End-to-End Test Gherkin + TypeScript
Behavior described in plain language, automated through the real UI. The Gherkin scenarios are the spec.
Feature: Checkout

  Scenario: Registered user completes purchase
    Given I am logged in as "alice@example.com"
    And I have 2 items in my cart
    When I proceed to checkout
    And I confirm my payment method
    Then I should see "Order confirmed"
    And I should receive a confirmation email
$ npx cucumber-js --require features/steps

Feature: Checkout

  Scenario: Registered user completes purchase
     Given I am logged in as "alice@example.com"
     And I have 2 items in my cart
     When I proceed to checkout
     And I confirm my payment method
     Then I should see "Order confirmed"
     And I should receive a confirmation email

1 scenario (1 passed)
6 steps (6 passed)
Time:   4.2s
E2E Test (Without BDD) TypeScript
You don't need Gherkin to write E2E tests. Playwright or Cypress can test full workflows directly.
// tests/e2e/checkout.spec.ts
import { test, expect } from '@playwright/test';

test('registered user can complete a purchase', async ({ page }) => {
  // Login
  await page.goto('/login');
  await page.fill('[name="email"]', 'alice@example.com');
  await page.fill('[name="password"]', 'testpassword');
  await page.click('button[type="submit"]');

  // Add items
  await page.goto('/products/1');
  await page.click('[data-testid="add-to-cart"]');

  // Checkout
  await page.goto('/cart');
  await page.click('[data-testid="checkout"]');
  await page.click('[data-testid="confirm-payment"]');

  // Verify
  await expect(page.locator('body'))
    .toContainText('Order confirmed');
});
$ npx playwright test checkout.spec.ts

Running 1 test using 1 worker

   registered user can complete a purchase (3.8s)

  1 passed (4.1s)

Generate your audit report with AI

Pick a crew, copy the prompt, run it in your LLM

Choose your audit crew