Linux

Bash Scripting Best Practices: Writing Scripts That Don't Bite

Bash scripts start simple and grow complex. A quick automation becomes critical infrastructure. Here’s how to write scripts that don’t become maintenance nightmares. Start Every Script Right 1 2 3 #!/usr/bin/env bash set -euo pipefail IFS=$'\n\t' What these do: #!/usr/bin/env bash — Portable shebang, finds bash in PATH set -e — Exit on any error set -u — Error on undefined variables set -o pipefail — Pipeline fails if any command fails IFS=$'\n\t' — Safer word splitting (no spaces) The Debugging Version 1 2 #!/usr/bin/env bash set -euxo pipefail The -x prints each command before execution. Remove for production. ...

Systemd Service Hardening: Running Services Securely

Most systemd services run with full system access by default. That’s fine until one gets compromised. Systemd provides powerful sandboxing options that most people never use. The Basics: User and Group Never run services as root if they don’t need it: 1 2 3 [Service] User=myapp Group=myapp Create a dedicated user: 1 sudo useradd --system --no-create-home --shell /usr/sbin/nologin myapp Filesystem Restrictions Read-Only Root Make the entire filesystem read-only: ...

SSH Tips and Tricks: Beyond Basic Connections

SSH is the workhorse of remote access. But most people only scratch the surface. Here’s how to use it like a pro. SSH Config: Stop Typing So Much Instead of: 1 ssh -i ~/.ssh/prod-key.pem -p 2222 ubuntu@ec2-54-123-45-67.compute-1.amazonaws.com Create ~/.ssh/config: H H H o o o s s s t t t H U P I H U I F P U p o s o d s o s d o r s r s e r e t s e e r . o e o t r t n a t r n w i x r d N t g N t a n y a u 2 i i a d i r t J a m b 2 t n m e t d e u d e u 2 y g e p y A r m m n 2 F l F g n p i e t i s o i e a n c u l t y l n l b 2 e a e t a - g s 5 ~ i ~ y t 4 / n / e i - . g . s o 1 s . s n 2 s e s 3 h x h - / a / 4 p m s 5 r p t - o l a 6 d e g 7 - . i . k c n c e o g o y m - m . k p p e u e y t m . e p - e 1 m . a m a z o n a w s . c o m Now just: ...

Systemd Service Management: Running Applications Reliably

Systemd is how modern Linux manages services. It starts your applications at boot, restarts them when they crash, and handles dependencies between services. Understanding systemd transforms “it works when I run it manually” into “it runs reliably in production.” Basic Service Unit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # /etc/systemd/system/myapp.service [Unit] Description=My Application After=network.target [Service] Type=simple User=appuser WorkingDirectory=/opt/myapp ExecStart=/opt/myapp/bin/server Restart=always RestartSec=5 [Install] WantedBy=multi-user.target 1 2 3 4 # Enable and start sudo systemctl daemon-reload sudo systemctl enable myapp sudo systemctl start myapp Service Types simple (default): Process started by ExecStart is the main process. ...

SSH Security Hardening: Protecting Your Most Critical Access Point

SSH is the front door to your infrastructure. Every server, every cloud instance, every piece of automation relies on it. A compromised SSH setup means game over. These hardening steps dramatically reduce your attack surface. Disable Password Authentication Passwords can be brute-forced. Keys can’t (practically): 1 2 3 4 # /etc/ssh/sshd_config PasswordAuthentication no ChallengeResponseAuthentication no UsePAM no 1 2 # Restart SSH (keep your current session open!) sudo systemctl restart sshd Before doing this, ensure your key is in ~/.ssh/authorized_keys. ...

Linux Performance Debugging: Finding What's Slow and Why

Something’s slow. Users are complaining. Your monitoring shows high latency but not why. You SSH into the server and need to figure out what’s wrong — fast. This is a systematic approach to Linux performance debugging. Start with the Big Picture Before diving deep, get an overview: 1 2 3 4 5 6 # System load and uptime uptime # 17:00:00 up 45 days, load average: 8.52, 4.23, 2.15 # Load average: 1/5/15 minute averages # Compare to CPU count: load 8 on 4 CPUs = overloaded 1 2 # Quick health check top -bn1 | head -20 CPU: Who’s Using It? 1 2 3 4 5 6 # Real-time CPU usage top # Sort by CPU: press 'P' # Sort by memory: press 'M' # Show threads: press 'H' 1 2 3 4 5 # CPU usage by process (snapshot) ps aux --sort=-%cpu | head -10 # CPU time accumulated ps aux --sort=-time | head -10 1 2 3 4 # Per-CPU breakdown mpstat -P ALL 1 5 # Shows each CPU core's utilization # Look for: one core at 100% (single-threaded bottleneck) 1 2 3 4 5 6 7 8 9 10 11 # What's the CPU actually doing? vmstat 1 5 # r b swpd free buff cache si so bi bo in cs us sy id wa st # 3 0 0 245612 128940 2985432 0 0 8 24 312 892 45 12 40 3 0 # r: processes waiting for CPU (high = CPU-bound) # b: processes blocked on I/O (high = I/O-bound) # us: user CPU time # sy: system CPU time # wa: waiting for I/O # id: idle Memory: Running Out? 1 2 3 4 5 6 7 # Memory overview free -h # total used free shared buff/cache available # Mem: 31Gi 12Gi 2.1Gi 1.2Gi 17Gi 17Gi # "available" is what matters, not "free" # Linux uses free memory for cache — that's good 1 2 3 4 5 6 7 8 # Top memory consumers ps aux --sort=-%mem | head -10 # Memory details per process pmap -x <pid> # Or from /proc cat /proc/<pid>/status | grep -i mem 1 2 3 # Check for OOM killer activity dmesg | grep -i "out of memory" journalctl -k | grep -i oom 1 2 3 # Swap usage (if swapping, you're in trouble) swapon -s vmstat 1 5 # si/so columns show swap in/out Disk I/O: The Hidden Bottleneck 1 2 3 4 5 6 7 8 9 10 11 # I/O wait in top top # Look at %wa in the CPU line # Detailed I/O stats iostat -xz 1 5 # Device r/s w/s rkB/s wkB/s await %util # sda 150.00 200.00 4800 8000 12.5 85.0 # await: average I/O wait time (ms) - high = slow disk # %util: disk utilization - 100% = saturated 1 2 3 4 5 6 # Which processes are doing I/O? iotop -o # Shows only processes actively doing I/O # Or without iotop pidstat -d 1 5 1 2 3 # Check for disk errors dmesg | grep -i error smartctl -a /dev/sda # SMART data for disk health Network: Connections and Throughput 1 2 3 4 5 6 7 8 # Network interface stats ip -s link show # Watch for errors, drops, overruns # Real-time bandwidth iftop # Or nload 1 2 3 4 5 6 7 # Connection states ss -s # Total: 1523 (kernel 1847) # TCP: 892 (estab 654, closed 89, orphaned 12, timewait 127) # Many TIME_WAIT: connection churn # Many ESTABLISHED: legitimate load or connection leak 1 2 3 4 5 # Connections per state ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn # Connections by remote IP ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head 1 2 3 4 5 6 # Check for packet loss ping -c 100 <gateway> # Any loss = network problem # TCP retransmits netstat -s | grep -i retrans Process-Level Debugging 1 2 3 4 5 6 # What's a specific process doing? strace -p <pid> -c # Summary of system calls strace -p <pid> -f -e trace=network # Network-related calls only 1 2 3 4 5 6 7 8 # Open files and connections lsof -p <pid> # Just network connections lsof -i -p <pid> # Files in a directory lsof +D /var/log 1 2 3 4 5 # Process threads ps -T -p <pid> # Thread CPU usage top -H -p <pid> System-Wide Tracing 1 2 3 4 5 6 7 # What's happening system-wide? perf top # Real-time view of where CPU cycles go # Record and analyze perf record -g -a sleep 10 perf report 1 2 3 4 5 6 7 8 9 # BPF-based tools (if available) # CPU usage by function profile-bpfcc 10 # Disk latency histogram biolatency-bpfcc # TCP connection latency tcpconnlat-bpfcc The Checklist Approach When debugging, go through systematically: ...