Logging seems simple until you’re debugging production at 2 AM, scrolling through millions of lines trying to find the one that matters. Good logging practices make that experience less painful. Here’s how to think about log levels.

The Levels

Most logging frameworks use these standard levels:

DEBUG<INFO<WARN<ERROR<FATAL

In production, you typically run at INFO or WARN. Lower levels include all higher levels (INFO includes WARN, ERROR, and FATAL).

DEBUG: Developer Breadcrumbs

Use for: Detailed information useful during development and troubleshooting.

1
2
3
logger.debug(f"Processing user {user_id}, cart contains {len(items)} items")
logger.debug(f"Cache lookup for key {cache_key}: {'hit' if cached else 'miss'}")
logger.debug(f"SQL query: {query} with params {params}")

Guidelines:

  • Assume these won’t be seen in production (too noisy)
  • Include variable values, state, and decision points
  • No sensitive data (passwords, tokens, PII)
  • Useful for reproducing issues locally

Enable in production when: Actively debugging a specific issue, then disable.

INFO: The Audit Trail

Use for: Normal operations that someone might care about later.

1
2
3
4
logger.info(f"User {user_id} logged in from {ip_address}")
logger.info(f"Order {order_id} placed: ${total:.2f}")
logger.info(f"Background job {job_name} completed in {duration:.2f}s")
logger.info("Application started on port 8080")

Guidelines:

  • One log per significant business event
  • Include enough context to understand what happened
  • Should tell a story when read in sequence
  • Not too noisy—if everything is INFO, nothing is

Good test: Would someone reviewing an incident want to see this?

WARN: Something’s Wrong But We’re Handling It

Use for: Unexpected situations that the system recovered from.

1
2
3
4
logger.warning(f"Cache miss for user {user_id}, falling back to database")
logger.warning(f"Retry {attempt}/3 for external API call to {endpoint}")
logger.warning(f"Configuration {key} not set, using default: {default}")
logger.warning(f"Request took {duration:.2f}s, exceeding threshold of {threshold}s")

Guidelines:

  • The system is still working, but something unexpected happened
  • Someone should eventually look at this
  • Often indicates potential problems before they become errors
  • Don’t overuse—if every request logs warnings, they become noise

Good test: Would you want an alert if this happened 100 times in an hour?

ERROR: Something Failed

Use for: Failures that affected a specific operation but didn’t crash the system.

1
2
3
logger.error(f"Failed to send email to {email}: {error}")
logger.error(f"Payment processing failed for order {order_id}: {error}")
logger.error(f"Database query failed: {error}", exc_info=True)

Guidelines:

  • A user-facing operation failed
  • Include the error/exception details
  • Include context to reproduce (IDs, inputs)
  • Don’t log and raise—pick one or the other

Good test: Should someone be woken up if this happens?

FATAL/CRITICAL: The System Is Down

Use for: Unrecoverable errors that require immediate intervention.

1
2
3
logger.fatal("Database connection pool exhausted, cannot serve requests")
logger.fatal("Out of disk space, shutting down")
logger.fatal(f"Required service {service_name} unreachable after all retries")

Guidelines:

  • The application cannot continue normally
  • Someone needs to act NOW
  • Should be rare—if you see these often, you have bigger problems

Common Mistakes

Over-logging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# BAD: Too much noise
def process_item(item):
    logger.info(f"Starting to process item {item.id}")
    logger.info(f"Validating item {item.id}")
    logger.info(f"Item {item.id} validated successfully")
    logger.info(f"Saving item {item.id}")
    logger.info(f"Item {item.id} saved successfully")
    logger.info(f"Finished processing item {item.id}")

# GOOD: One meaningful log
def process_item(item):
    logger.debug(f"Processing item {item.id}")
    # ... do work ...
    logger.info(f"Processed item {item.id} in {duration:.2f}s")

Wrong Levels

1
2
3
4
5
6
7
8
# BAD: This isn't debug, it's info
logger.debug(f"User {user_id} completed checkout")

# BAD: This isn't an error, it's expected
logger.error(f"User {user_id} entered wrong password")  # Use INFO

# BAD: This isn't a warning, it's an error
logger.warning(f"Database connection failed: {error}")  # Use ERROR

Missing Context

1
2
3
4
5
6
7
8
9
# BAD: Useless without context
logger.error("Failed to process request")

# GOOD: Actionable
logger.error(
    f"Failed to process request: path={request.path} "
    f"user={user_id} error={error}",
    exc_info=True
)

Logging Sensitive Data

1
2
3
4
5
6
7
# BAD: Leaking credentials
logger.debug(f"Connecting with password {password}")
logger.info(f"User data: {user.__dict__}")  # Might include email, SSN, etc.

# GOOD: Mask or omit sensitive fields
logger.debug(f"Connecting as user {username}")
logger.info(f"User {user.id} updated profile")

Structured Logging

Plain text logs are hard to parse. Use structured logging:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import structlog

logger = structlog.get_logger()

# Instead of string interpolation
logger.info(
    "order_placed",
    order_id=order.id,
    user_id=user.id,
    total=order.total,
    items=len(order.items)
)

Output:

1
2
3
4
5
6
7
8
9
{
  "event": "order_placed",
  "order_id": "ord_123",
  "user_id": "usr_456",
  "total": 99.99,
  "items": 3,
  "timestamp": "2024-03-01T09:30:00Z",
  "level": "info"
}

Why structured?

  • Easy to search: jq 'select(.order_id == "ord_123")'
  • Easy to aggregate: count events by type, sum totals
  • Easy to alert: trigger on specific field values

Quick Reference

LevelUse WhenProduction Visibility
DEBUGTracing code executionOff by default
INFOBusiness events, audit trailAlways on
WARNRecoverable problemsAlways on
ERROROperation failuresAlways on + alerts
FATALSystem failuresAlways on + pages

The rule: Log at the level where someone would want to see it. If you’re unsure, start at DEBUG and promote to INFO once you know it matters.