Devops

Nginx 502 Bad Gateway: What It Means and How to Fix It

You refresh your page and see it: 502 Bad Gateway. Nginx is telling you something went wrong, but what? This guide covers the most common causes and how to fix each one. What 502 Bad Gateway Actually Means A 502 error means Nginx (acting as a reverse proxy) tried to contact your upstream server (your app) and either: Couldn’t connect at all Got an invalid response The connection timed out Nginx is working fine. Your upstream is the problem. ...

Fix: Docker Container Stuck in Restart Loop

Your container starts, immediately dies, restarts, dies again. The docker ps output shows “Restarting (1) 2 seconds ago” and you’re watching it cycle endlessly. Here’s how to break the loop and find the actual problem. Step 1: Check the Exit Code First, figure out how it’s dying: 1 docker inspect --format='{{.State.ExitCode}}' container_name Common exit codes: 0 — Clean exit (shouldn’t restart unless you have restart: always) 1 — Application error (check logs) 137 — Killed by OOM (out of memory) 139 — Segmentation fault 143 — SIGTERM received (graceful shutdown request) Step 2: Read the Logs (Before They Disappear) Containers in restart loops lose logs on each restart. Catch them quick: ...

Fix: Docker Container Can't Resolve DNS (And Why It Happens)

Your container builds fine, starts fine, then fails with Could not resolve host or Temporary failure in name resolution. Here’s how to fix it. Quick Diagnosis First, confirm it’s actually DNS and not a network issue: 1 2 3 4 5 6 7 8 # Get a shell in the container docker exec -it <container_name> sh # Test DNS specifically nslookup google.com # or cat /etc/resolv.conf ping 8.8.8.8 # If this works but nslookup fails, it's DNS If ping 8.8.8.8 works but nslookup google.com fails, you have a DNS problem. If both fail, it’s a broader network issue. ...

Why Your Health Check Didn't Catch the Outage

You wake up to angry messages. Your service has been down for hours. You check your monitoring dashboard — all green. What happened? The answer is almost always the same: your health check died with the thing it was checking. The Problem: Shared Failure Domains Here’s a common setup that looks correct but isn’t: ┌ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ┌ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ( ─ ─ ─ ─ S p ─ ─ ─ ─ e o ─ ─ ─ ─ r r ─ ─ ─ ─ v t ─ │ └ ┌ │ │ │ └ ─ ─ ─ i ─ ─ ─ ─ ─ ─ ─ c 8 ─ ─ ─ ( ─ ─ ─ ─ e 0 ─ ─ ─ l c ─ ─ ─ ─ ) ─ ─ ─ T o l ─ ─ I ─ Y ─ ─ ─ ─ u c o ─ ─ n ─ o ─ ─ ─ ─ n a u ─ ─ t ─ u ┐ │ │ ┘ ─ ─ n l d ─ ─ e ─ r ┬ ▼ ─ e t f ─ │ │ ▼ r ─ ─ ─ l u l ─ ─ n ─ S ─ ─ n a ─ ─ e ─ e ─ ─ n r ─ ─ t ─ r ┌ │ │ └ ─ ─ e e ─ ─ ─ v ─ ─ ─ ─ l d ─ ─ ─ e ─ ─ ─ ┐ │ │ ) ┘ ─ ─ r ─ H ─ ─ │ ─ ─ ─ e ( ─ ─ ─ ─ ─ a c ─ ─ ─ ─ ─ l r ─ ─ ─ ─ ─ t o ─ ─ ─ ─ ─ h n ─ │ ┘ ─ ─ ─ ─ ─ ─ ─ C j ─ ─ ─ ─ h o ─ ─ ─ ─ e b ─ ─ ─ ─ c ) ─ ─ ─ ─ k ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ ┘ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┘ The health check runs on the same server, uses the same tunnel, and sends alerts through… the same tunnel. When the tunnel dies, both the service AND the alerting die together. ...

Why Your Cron Job Isn't Running: A Troubleshooting Guide

You’ve added your cron job, the syntax looks right, but nothing happens. No output, no errors, just silence. This is one of the most frustrating debugging experiences in Linux administration. Here’s how to systematically find and fix the problem. Check If Cron Is Actually Running First, verify the cron daemon is running: 1 2 3 4 5 6 7 # systemd systems (Ubuntu 16.04+, CentOS 7+, Debian 8+) systemctl status cron # or on some systems systemctl status crond # Older init systems service cron status If cron isn’t running, start it: ...

Ansible Automation Patterns: Beyond the Basics

Ansible’s learning curve is gentle until it isn’t. Simple playbooks work great, then suddenly you’re debugging variable precedence at midnight. Here are patterns that keep automation maintainable as it scales. Directory Structure That Scales Forget the flat playbook approach. Use roles from day one: a ├ │ │ │ │ │ │ │ │ ├ │ │ │ │ ├ │ │ │ └ n ─ ─ ─ ─ s ─ ─ ─ ─ i b i ├ │ │ │ │ │ └ r ├ ├ ├ └ p ├ ├ └ a l n ─ ─ o ─ ─ ─ ─ l ─ ─ ─ n e ─ ─ l ─ ─ ─ ─ a ─ ─ ─ s / e e y i n p ├ └ s └ s c n p a b s w d b t r ─ ─ t ─ / o g o p o i e a l o o ─ ─ a ─ m i s p t b t e r d g m n t k e s a . y u h g ├ ├ └ i o x g s . e b c / c o r ─ ─ ─ n n r y r a f t s o ─ ─ ─ g e m v s g i t u s l e e o s p a w d q r s n . _ l e a l s . y v l b t / . y m a . s a y m l r y e b m l s m r a l / l v s e e r s s . . y y m m l l Each role follows the standard structure: ...

Terraform State Management: Don't Learn This the Hard Way

Terraform state is both the source of its power and the cause of most Terraform disasters. Get it wrong and you’re recreating production resources at 2 AM. Get it right and infrastructure changes become boring (the good kind). What State Actually Is Terraform state is a JSON file that maps your configuration to real resources. When you write aws_instance.web, Terraform needs to know which actual EC2 instance that refers to. State is that mapping. ...

Observability Without Noise: Monitoring That Actually Helps

Most monitoring systems fail the same way: they’re either too noisy (you ignore them) or too quiet (you miss real problems). The goal isn’t more data—it’s better signal. The Alert Fatigue Problem I run infrastructure health checks every few hours. Here’s what I learned: the moment you start ignoring alerts, your monitoring is broken. Doesn’t matter how comprehensive it is. The failure mode isn’t technical. It’s human psychology. After the third false alarm at 3 AM, your brain learns to dismiss the notification sound. Real problems slip through because they look like everything else. ...

SSH Config Tips That Save Hours

Your ~/.ssh/config file is the most underused productivity tool in your terminal. Here’s how to make SSH work for you. Basic Structure H o s t H U P m o s o y s e r s t r t e N r a a 2 v m d 2 e e m r i 1 n 9 2 . 1 6 8 . 1 . 1 0 0 Now ssh myserver replaces ssh admin@192.168.1.100. ...

Systemd Service Management: A Practical Guide

Systemd is the init system for most modern Linux distributions. Love it or hate it, you need to know it. Here’s how to manage services effectively. Basic Commands 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # Start/stop/restart sudo systemctl start nginx sudo systemctl stop nginx sudo systemctl restart nginx # Reload config without restart sudo systemctl reload nginx # Enable/disable at boot sudo systemctl enable nginx sudo systemctl disable nginx # Check status systemctl status nginx # List all services systemctl list-units --type=service # List failed services systemctl --failed Writing a Service Unit Create /etc/systemd/system/myapp.service: ...