Testing Strategies That Actually Scale

Everyone agrees testing is important. Few teams do it well at scale. Here’s what separates test suites that help from ones that slow everyone down.

The Testing Pyramid (And Why It’s Still Right)

This isn’t new, but teams still get it backwards:

Too many slow E2E tests
Too few fast unit tests
Flaky tests everywhere

The pyramid works because it optimizes for feedback speed and failure isolation.

Unit Tests: The Foundation

Fast, isolated, deterministic.

1
2
3
4
5
6
7
8
# Good: Pure function, easy to test
def calculate_discount(price: float, tier: str) -> float:
    rates = {"bronze": 0.05, "silver": 0.10, "gold": 0.15}
    return price * rates.get(tier, 0)

def test_calculate_discount():
    assert calculate_discount(100, "gold") == 15.0
    assert calculate_discount(100, "unknown") == 0.0

1
2
3
4
5
6
7
# Bad: Depends on database, time, external service
def get_user_discount(user_id):
    user = db.get(user_id)  # DB dependency
    if user.signup_date < datetime.now() - timedelta(days=365):  # Time dependency
        bonus = loyalty_service.get_bonus(user_id)  # External dependency
        return bonus * 1.5
    return 0

Fix the bad example by injecting dependencies:

1
2
3
4
5
6
7
8
9
def get_user_discount(user, current_time, loyalty_bonus):
    if user.signup_date < current_time - timedelta(days=365):
        return loyalty_bonus * 1.5
    return 0

def test_loyal_user_gets_bonus():
    user = User(signup_date=datetime(2024, 1, 1))
    result = get_user_discount(user, datetime(2026, 3, 1), 10.0)
    assert result == 15.0

What Makes Unit Tests Fast

Target: < 10ms per test, full suite < 30 seconds.

Avoid:

Database calls (use fakes/mocks)
File system (use in-memory)
Network (mock HTTP clients)
Sleep/time delays

1
2
3
4
5
6
# Mock external calls
@patch('myapp.services.external_api.get_data')
def test_with_mock(mock_get):
    mock_get.return_value = {"status": "ok"}
    result = process_data()
    assert result.status == "ok"

Integration Tests: The Middle Ground

Test that components work together, but scope carefully.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Database integration test
@pytest.fixture
def db():
    engine = create_engine("sqlite:///:memory:")
    Base.metadata.create_all(engine)
    yield Session(engine)

def test_user_repository(db):
    repo = UserRepository(db)
    user = repo.create(name="Alice", email="alice@test.com")
    found = repo.get_by_email("alice@test.com")
    assert found.name == "Alice"

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# API integration test
def test_create_and_fetch_user(test_client):
    # Create
    response = test_client.post("/users", json={"name": "Bob"})
    assert response.status_code == 201
    user_id = response.json()["id"]
    
    # Fetch
    response = test_client.get(f"/users/{user_id}")
    assert response.json()["name"] == "Bob"

Containers for Integration Tests

Use testcontainers for real dependencies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from testcontainers.postgres import PostgresContainer

@pytest.fixture(scope="session")
def postgres():
    with PostgresContainer("postgres:15") as pg:
        yield pg.get_connection_url()

def test_with_real_postgres(postgres):
    engine = create_engine(postgres)
    # Test against real Postgres

E2E Tests: Use Sparingly

E2E tests are slow, flaky, and expensive. Use them for critical paths only.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Playwright E2E test
def test_user_signup_flow(page):
    page.goto("/signup")
    page.fill("#email", "test@example.com")
    page.fill("#password", "SecurePass123!")
    page.click("button[type=submit]")
    
    # Wait for redirect
    expect(page).to_have_url("/dashboard")
    expect(page.locator("h1")).to_contain_text("Welcome")

When to Write E2E Tests

✅ Yes:

User signup/login flow
Checkout/payment
Core business workflows
Smoke tests (app loads, critical pages work)

❌ No:

Every feature
Edge cases (unit test those)
Performance testing (use dedicated tools)

Fighting Flaky Tests

Flaky tests destroy confidence. Fix or delete them.

Common Causes

Race conditions:

1
2
3
4
5
6
7
8
9
# Bad: Assumes immediate consistency
user_service.update(user_id, name="New Name")
user = user_service.get(user_id)
assert user.name == "New Name"  # Might fail if async

# Good: Wait for condition or use synchronous path
user_service.update_sync(user_id, name="New Name")
user = user_service.get(user_id)
assert user.name == "New Name"

Time dependencies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Bad: Depends on wall clock
def test_expires_after_24h():
    token = create_token()
    time.sleep(86401)  # Don't do this
    assert token.is_expired()

# Good: Inject time
def test_expires_after_24h(freezer):
    token = create_token()
    freezer.move_to(datetime.now() + timedelta(hours=25))
    assert token.is_expired()

Order dependencies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Bad: Test depends on another test's state
def test_create_user():
    create_user("alice")
    
def test_get_user():
    user = get_user("alice")  # Fails if test_create didn't run first
    
# Good: Each test sets up its own state
def test_get_user():
    create_user("bob")  # Own setup
    user = get_user("bob")

Quarantine Flaky Tests

Don’t let flaky tests block CI:

1
2
3
4
5
6
7
8
9
# pytest.ini
[pytest]
markers =
    flaky: mark test as flaky (deselect with '-m "not flaky"')

# test_something.py
@pytest.mark.flaky
def test_sometimes_fails():
    ...

Run flaky tests separately, fix them, or delete them.

Test Organization

By Type

By Feature

Feature-based scales better for large codebases.

CI Pipeline Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
test:
  stage: test
  parallel:
    matrix:
      - TYPE: unit
      - TYPE: integration
      - TYPE: e2e
  script:
    - pytest tests/$TYPE -v --junitxml=results-$TYPE.xml
  
# Run unit tests first (fast feedback)
unit_tests:
  stage: test
  script:
    - pytest tests/unit --timeout=10
  timeout: 5 minutes

# Integration tests after unit pass
integration_tests:
  stage: test
  needs: [unit_tests]
  services:
    - postgres:15
  script:
    - pytest tests/integration
  timeout: 15 minutes

# E2E last, only on main branch
e2e_tests:
  stage: test
  needs: [integration_tests]
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  script:
    - pytest tests/e2e
  timeout: 30 minutes

Coverage: Quality Over Quantity

80% coverage doesn’t mean 80% tested.

1
2
3
4
5
6
# 100% coverage, 0% useful
def add(a, b):
    return a + b

def test_add():
    assert add(1, 1) == 2  # Covers the line, tests nothing interesting

Focus on:

Branch coverage over line coverage
Critical paths over everything
Edge cases that actually break

1
2
3
def test_add_handles_overflow():
    # This is what actually matters
    assert add(sys.maxsize, 1) == ...  # Define expected behavior

The Test Quality Checklist

Before merging tests, verify:

Test names describe behavior (test_user_gets_discount_after_one_year)
No hardcoded waits (time.sleep)
No order dependencies
Each test can run in isolation
Failures are actionable (good error messages)
Test runs in < 1 second (unit) or < 30 seconds (integration)

Good tests are an asset. Bad tests are liability. Write fewer, better tests.

The goal isn’t to have tests. It’s to have confidence. Fast, reliable tests give you confidence to ship.

The Testing Pyramid (And Why It’s Still Right)#

Unit Tests: The Foundation#

What Makes Unit Tests Fast#

Integration Tests: The Middle Ground#

Containers for Integration Tests#

E2E Tests: Use Sparingly#

When to Write E2E Tests#

Fighting Flaky Tests#

Common Causes#

Quarantine Flaky Tests#

Test Organization#

By Type#

By Feature#

CI Pipeline Structure#

Coverage: Quality Over Quantity#

The Test Quality Checklist#

📬 Get the Newsletter