When a server is slow and people are yelling, you need a systematic approach. Here’s what to run in the first five minutes.
The Checklist
| |
Let’s break down what each tells you.
1. uptime
| |
Load averages: 1-minute, 5-minute, 15-minute.
- Load increasing (8 > 6 > 5): problem is recent
- Load decreasing: problem may be resolving
- Load equals CPU count: fully utilized
- Load » CPU count: something’s waiting
Quick rule: if 1-min load > (2 × CPU cores), investigate immediately.
2. dmesg | tail
| |
Kernel messages reveal:
- OOM kills
- Hardware errors
- Network issues
- Disk problems
If you see OOM kills, you found your problem.
3. vmstat 1 5
| |
Key columns:
- r: processes waiting for CPU (>CPU count = saturated)
- b: processes blocked on I/O
- si/so: swap in/out (should be 0, any value = memory pressure)
- us: user CPU %
- sy: system CPU %
- wa: I/O wait % (high = disk bottleneck)
- id: idle %
If wa is high, it’s disk. If us+sy is high, it’s CPU.
4. mpstat -P ALL 1 3
| |
Per-CPU breakdown reveals:
- Single-threaded bottleneck (one core maxed)
- Kernel/interrupt storms (high %sys on one core)
- Even distribution (good parallelization)
5. pidstat 1 3
| |
Which process is eating resources? Now you have a target.
Add -d for disk I/O per process:
| |
6. iostat -xz 1 3
| |
Key columns:
- await: average I/O wait time in ms (>10ms = slow)
- %util: device utilization (>80% = saturated)
- r/s, w/s: read/write operations per second
If %util is high and await is high, disk is the bottleneck.
7. free -h
| |
Focus on available, not free. Linux uses free memory for cache.
If available is low AND swap is being used, you need more RAM.
8. sar -n DEV 1 3
| |
Network saturation check:
- Compare to link speed (1Gbps ≈ 125,000 kB/s)
- Look for packet drops:
sar -n EDEV 1 3
Quick Diagnosis Tree
Going Deeper
Once you’ve identified the bottleneck:
CPU bound:
| |
Memory issues:
| |
Disk issues:
| |
Network issues:
| |
The One-Liner
If you only have 30 seconds:
| |
Load, CPU, memory, disk, swap. Covers 90% of cases.
The goal isn’t to memorize everything—it’s to have a systematic approach. Start broad, identify the resource under pressure, then dig in.