# DEBUG: Detailed diagnostic info (disabled in production)logger.debug("cache_lookup",key=cache_key,hit=True)# INFO: Normal operations worth recordinglogger.info("user_login",user_id=user_id,method="oauth")# WARNING: Something unexpected but handledlogger.warning("rate_limit_approaching",user_id=user_id,remaining=5)# ERROR: Something failed but the system continueslogger.error("payment_failed",user_id=user_id,error="card_declined")# CRITICAL: System is in trouble, immediate attention neededlogger.critical("database_unavailable",host=db_host,retry_count=10)
The rule: if you’d page someone for it, it’s CRITICAL. If it needs investigation tomorrow, it’s ERROR. If it’s just interesting, it’s INFO.
# Request boundarieslogger.info("request_started",path=path,method=method)logger.info("request_completed",status=status,duration_ms=duration)# Authentication eventslogger.info("auth_success",user_id=user_id,method=method)logger.warning("auth_failure",reason=reason,ip=ip)# Business eventslogger.info("order_placed",order_id=order_id,amount=amount)logger.info("subscription_changed",user_id=user_id,plan=new_plan)# Errors with contextlogger.error("operation_failed",operation="send_email",error=str(e),user_id=user_id,retry_count=retry_count,)
# Passwords, tokens, API keyslogger.info("login",password=password)# NEVER# Full credit card numberslogger.info("payment",card_number=card)# NEVER# Personal data that could identify individualslogger.info("user_data",ssn=ssn,medical_records=records)# NEVER# Large payloads that bloat storagelogger.debug("request_body",body=megabyte_payload)# AVOID
exceptExceptionase:logger.error("payment_processing_failed",error=str(e),error_type=type(e).__name__,user_id=user_id,order_id=order_id,amount=amount,payment_method=payment_method,exc_info=True,# Include stack trace)
Include everything needed to reproduce and debug. The goal: someone reading this log at 3 AM can understand what happened without looking at code.
importtimefromcontextlibimportcontextmanager@contextmanagerdeflog_duration(operation:str,**context):start=time.perf_counter()try:yieldfinally:duration_ms=(time.perf_counter()-start)*1000logger.info(f"{operation}_completed",duration_ms=round(duration_ms,2),**context)# Usagewithlog_duration("database_query",table="users",query_type="select"):results=awaitdb.fetch("SELECT * FROM users WHERE ...")
When requests span multiple services, you need to trace them end-to-end:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Service A: Generate correlation IDcorrelation_id=str(uuid.uuid4())logger.info("calling_service_b",correlation_id=correlation_id)response=awaithttp_client.post("http://service-b/api/process",headers={"X-Correlation-ID":correlation_id},json=payload)# Service B: Extract and log correlation ID@app.middleware("http")asyncdefcorrelation_middleware(request,call_next):correlation_id=request.headers.get("X-Correlation-ID",str(uuid.uuid4()))structlog.contextvars.bind_contextvars(correlation_id=correlation_id)returnawaitcall_next(request)
Search by correlation_id to see the entire distributed flow.
Structured logging is infrastructure that pays off every time something breaks. Invest the time upfront, and debugging becomes searching instead of guessing.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.