Background Job Patterns: Processing Work Outside the Request Cycle
Not everything belongs in a web request. Here's how to design reliable background job systems.
February 24, 2026 · 7 min · 1300 words · Rob Washington
Table of Contents
Some work doesn’t belong in a web request. Sending emails, processing uploads, generating reports, syncing with external APIs — these tasks are too slow, too unreliable, or too resource-intensive to run while a user waits.
Background jobs solve this by moving work out of the request cycle and into a separate processing system.
# Celery with RedisfromceleryimportCeleryapp=Celery('tasks',broker='redis://localhost:6379/0')@app.taskdefsend_email(user_id,template):user=get_user(user_id)email_service.send(user.email,template)
Pros: Fast, simple, good ecosystem
Cons: Not durable by default (can lose jobs on crash)
# Simple polling approachdefenqueue(job_type,payload):db.execute("""
INSERT INTO jobs (type, payload, status, created_at)
VALUES (%s, %s, 'pending', NOW())
""",(job_type,json.dumps(payload)))deffetch_job():returndb.execute("""
UPDATE jobs SET status = 'processing', started_at = NOW()
WHERE id = (
SELECT id FROM jobs
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *
""").fetchone()
Pros: Durable, transactional with your data, no extra infrastructure
Cons: Slower, requires careful locking
# AWS SQSimportboto3sqs=boto3.client('sqs')defenqueue(job):sqs.send_message(QueueUrl='https://sqs.../my-queue',MessageBody=json.dumps(job),MessageGroupId='default'# For FIFO queues)defprocess_messages():whileTrue:response=sqs.receive_message(QueueUrl='https://sqs.../my-queue',MaxNumberOfMessages=10,WaitTimeSeconds=20# Long polling)formessageinresponse.get('Messages',[]):process(json.loads(message['Body']))sqs.delete_message(QueueUrl='https://sqs.../my-queue',ReceiptHandle=message['ReceiptHandle'])
Pros: Highly durable, scales independently, managed options available
Cons: More infrastructure, eventual consistency
Jobs may run multiple times (retries, duplicates). Design for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@app.task(bind=True)defcharge_customer(self,order_id):order=get_order(order_id)# Check if already processediforder.payment_id:logger.info(f"Order {order_id} already charged")returnorder.payment_id# Process with idempotency keypayment=stripe.PaymentIntent.create(amount=order.total,idempotency_key=f"order-{order_id}")order.payment_id=payment.idorder.save()returnpayment.id
@app.task(bind=True,max_retries=5,default_retry_delay=60,# Base delayretry_backoff=True,# Exponential backoffretry_jitter=True# Add randomness)defsync_to_external_api(self,record_id):try:record=get_record(record_id)external_api.sync(record)exceptExternalAPIErrorase:# Retry with exponential backoffraiseself.retry(exc=e)exceptPermanentErrorase:# Don't retry, move to dead letterlog_permanent_failure(record_id,e)raise
@app.task(bind=True,max_retries=3)defprocess_upload(self,upload_id):try:# Process...passexceptExceptionase:ifself.request.retries>=self.max_retries:# Move to dead letter queue for manual reviewdead_letter_queue.send({'task':'process_upload','args':[upload_id],'error':str(e),'traceback':traceback.format_exc(),'failed_at':datetime.utcnow().isoformat()})return# Don't raise, we've handled itraiseself.retry(exc=e)
defenqueue_unique(job_type,payload,unique_key):# Use Redis SET NX for uniquenesslock_key=f"job_lock:{job_type}:{unique_key}"ifredis.set(lock_key,"1",nx=True,ex=3600):# Lock acquired, safe to enqueuequeue.enqueue(job_type,payload)returnTrueelse:# Job already queuedreturnFalse# Usageenqueue_unique("send_welcome_email",{"user_id":123},unique_key="user:123")
@app.task(bind=True)defgenerate_report(self,report_id):report=get_report(report_id)items=get_items_for_report(report_id)fori,iteminenumerate(items):process_item(item)# Update progressprogress=(i+1)/len(items)*100self.update_state(state='PROGRESS',meta={'progress':progress,'current':i+1,'total':len(items)})return{'status':'complete','url':report.url}# Check progress from web appresult=generate_report.AsyncResult(task_id)ifresult.state=='PROGRESS':print(f"Progress: {result.info['progress']}%")
# Run in 1 hoursend_reminder.apply_async(args=[user_id],countdown=3600)# Run at specific timesend_report.apply_async(args=[report_id],eta=datetime(2024,2,5,9,0,0))
fromceleryimportCeleryfromcelery.utils.timeimportrateapp=Celery()@app.task(rate_limit='10/m')# Max 10 per minutedefcall_external_api(record_id):# This task will be throttledpass
# Celery: 4 concurrent workerscelery -A tasks worker --concurrency=4# For I/O-bound tasks, use gevent/eventletcelery -A tasks worker --pool=gevent --concurrency=100
# Define queues with prioritiesapp.conf.task_routes={'tasks.critical_*':{'queue':'high'},'tasks.batch_*':{'queue':'low'},}# Run workers for specific queues# celery -A tasks worker -Q high --concurrency=4# celery -A tasks worker -Q low --concurrency=2
@app.get("/health/workers")defworker_health():# Check if workers are responsiveinspector=app.control.inspect()active=inspector.active()ifnotactive:raiseHTTPException(503,"No active workers")# Check queue depthqueue_length=redis.llen('celery')ifqueue_length>10000:raiseHTTPException(503,f"Queue backlog: {queue_length}")return{"workers":len(active),"queue_length":queue_length}
Not handling job failures: Always have retry logic and dead letter handling
Blocking workers with long tasks: Use dedicated queues for slow jobs
Forgetting idempotency: Jobs will run multiple times
Queue as database: Don’t store important state only in the queue
No visibility: If you can’t see what’s happening, you can’t fix it
Unbounded queues: Set limits, apply backpressure when overwhelmed
Background jobs are how you build responsive applications that handle real-world complexity. Start simple (PostgreSQL + polling works fine), add sophistication as needed, and always design for failure — because in distributed systems, failure is the norm.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.