Understanding processes is fundamental to Linux troubleshooting. These tools and techniques will help you find what’s running, what’s stuck, and what needs to die.

Viewing Processes

ps - Process Snapshot

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# All processes (BSD style)
ps aux

# All processes (Unix style)
ps -ef

# Process tree
ps auxf

# Specific columns
ps -eo pid,ppid,user,%cpu,%mem,stat,cmd

# Find specific process
ps aux | grep nginx

# By exact name (no grep needed)
ps -C nginx

# By user
ps -u www-data

Understanding ps Output

UrwSowEowRt-data1P2I3D14%C02P..U05%M01E..M12146596V97S38Z691938R27S56S65T??TYSSSTslATSF1Te0Ab:R20T40T05I::M02E33C/nOsgMbiMinAnxN/:Dinwiotrker
  • PID: Process ID
  • %CPU: CPU usage
  • %MEM: Memory usage
  • VSZ: Virtual memory size
  • RSS: Resident set size (actual RAM)
  • STAT: Process state
  • TIME: CPU time consumed

Process States (STAT)

RSDZTNslRSSZSHLSMFullotioeuoneemogwslrneebphsteippippiigniieepro-rgnndrintoggiohuorlrn((rieediuitaanntyddptiyeerenrdortcreeursprstuipbtlieb)le,usuallyI/O)

top - Real-time View

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Basic
top

# Sort by memory
top -o %MEM

# Specific user
top -u www-data

# Batch mode (for scripts)
top -b -n 1

Inside top:

  • M - Sort by memory
  • P - Sort by CPU
  • k - Kill process
  • r - Renice process
  • c - Show full command
  • H - Show threads
  • q - Quit

htop - Better top

1
2
3
4
htop

# Filter by user
htop -u www-data

htop features:

  • Mouse support
  • Horizontal/vertical scrolling
  • Tree view (F5)
  • Search (F3)
  • Filter (F4)
  • Kill (F9)

pgrep - Find by Name

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# PIDs matching name
pgrep nginx

# With full command line match
pgrep -f "nginx -g"

# List with process names
pgrep -l nginx

# Count matches
pgrep -c nginx

# Newest/oldest match
pgrep --newest nginx
pgrep --oldest nginx

Process Trees

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Tree view
pstree

# With PIDs
pstree -p

# For specific process
pstree -p 1234

# Show arguments
pstree -a

# Highlight specific PID
pstree -H 1234

Controlling Processes

Signals

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# List all signals
kill -l

# Common signals:
# SIGHUP  (1)  - Hangup, often triggers config reload
# SIGINT  (2)  - Interrupt (Ctrl+C)
# SIGQUIT (3)  - Quit with core dump
# SIGKILL (9)  - Force kill (cannot be caught)
# SIGTERM (15) - Graceful termination (default)
# SIGSTOP (19) - Pause process
# SIGCONT (18) - Resume paused process

kill - Send Signals

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Graceful termination (default SIGTERM)
kill 1234

# Force kill
kill -9 1234
kill -KILL 1234

# Reload config (many daemons)
kill -HUP 1234

# By name
pkill nginx
pkill -f "python myapp.py"

# Kill all by name
killall nginx

# Kill all by user
pkill -u baduser

Process Priority (nice/renice)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Run with lower priority (higher nice = lower priority)
nice -n 10 ./cpu-intensive-script.sh

# Run with higher priority (requires root)
nice -n -10 ./important-process

# Change priority of running process
renice 10 -p 1234

# Renice by user
renice 5 -u www-data

Background and Foreground

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Run in background
./script.sh &

# List background jobs
jobs

# Bring to foreground
fg %1

# Send to background
bg %1

# Suspend current process
Ctrl+Z

# Disown (detach from terminal)
disown %1

# Run immune to hangups
nohup ./script.sh &

# Or use screen/tmux for persistent sessions

Finding Resource Hogs

CPU

1
2
3
4
5
# Top CPU consumers
ps aux --sort=-%cpu | head -10

# Real-time
top -o %CPU

Memory

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Top memory consumers
ps aux --sort=-%mem | head -10

# Detailed memory info
ps -eo pid,ppid,cmd,%mem,rss --sort=-rss | head -10

# Memory by process name
ps -C nginx -o pid,rss,cmd

# System memory overview
free -h

I/O

1
2
3
4
5
6
7
8
# I/O statistics
iotop

# By process
iotop -o  # Only show processes doing I/O

# Accumulated I/O
iotop -a

Process Details

/proc Filesystem

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Process info
ls /proc/1234/

# Command line
cat /proc/1234/cmdline | tr '\0' ' '

# Environment
cat /proc/1234/environ | tr '\0' '\n'

# File descriptors
ls -la /proc/1234/fd/

# Memory maps
cat /proc/1234/maps

# Current working directory
ls -la /proc/1234/cwd

# Executable
ls -la /proc/1234/exe

# Limits
cat /proc/1234/limits

# Status summary
cat /proc/1234/status

lsof - List Open Files

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# All open files by process
lsof -p 1234

# Files in directory
lsof +D /var/log

# Network connections by process
lsof -i -p 1234

# Who's using a port
lsof -i :80

# Who's using a file
lsof /var/log/syslog

# Files by user
lsof -u www-data

strace - System Calls

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Trace running process
strace -p 1234

# Trace new process
strace ./myapp

# Summary of calls
strace -c ./myapp

# Follow forks
strace -f ./myapp

# Filter specific calls
strace -e open,read,write ./myapp

# With timestamps
strace -t ./myapp

# Output to file
strace -o trace.log ./myapp

Zombie Processes

Zombies are finished processes waiting for parent to collect exit status:

1
2
3
4
5
# Find zombies
ps aux | grep 'Z'

# Count zombies
ps aux | awk '$8=="Z" {print}' | wc -l

To fix:

  1. Signal parent to collect child: kill -SIGCHLD <parent_pid>
  2. Kill parent (zombies disappear when parent dies)
  3. Reboot (last resort)

Orphan Processes

Orphans are adopted by init/systemd (PID 1):

1
2
# Find processes owned by init
ps -ef | awk '$3==1'

Process Limits

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# View limits
ulimit -a

# Max open files
ulimit -n

# Set for session
ulimit -n 65535

# Permanent (in /etc/security/limits.conf)
www-data soft nofile 65535
www-data hard nofile 65535

Cgroups (Resource Limits)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# View cgroup of process
cat /proc/1234/cgroup

# Using systemd-run for limits
systemd-run --scope -p MemoryLimit=500M ./memory-hungry-app

# Create cgroup manually
mkdir /sys/fs/cgroup/memory/mygroup
echo 500M > /sys/fs/cgroup/memory/mygroup/memory.limit_in_bytes
echo 1234 > /sys/fs/cgroup/memory/mygroup/cgroup.procs

Common Patterns

Find and Kill

1
2
3
4
5
6
7
8
# Find process using port 8080 and kill it
kill $(lsof -t -i:8080)

# Kill all processes matching pattern
pkill -f "python.*myapp"

# Kill with confirmation
pgrep -l myapp && pkill myapp

Monitor Process

1
2
3
4
5
# Watch process count
watch -n 1 'ps aux | grep nginx | wc -l'

# Monitor specific PID
watch -n 1 'ps -p 1234 -o %cpu,%mem,etime'

Wait for Process

1
2
3
4
5
6
7
# Wait for specific PID
while kill -0 1234 2>/dev/null; do sleep 1; done
echo "Process finished"

# Wait for process by name
while pgrep -x nginx > /dev/null; do sleep 1; done
echo "nginx stopped"

Log Process Output

1
2
# Capture output of running process (if you have strace)
strace -p 1234 -e write -s 1000 2>&1 | grep "write(1"

Quick Reference

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# View
ps aux                    # All processes
top / htop               # Real-time
pstree -p                # Tree with PIDs
pgrep -l name            # Find by name

# Control
kill PID                 # Graceful stop
kill -9 PID              # Force kill
pkill name               # Kill by name
killall name             # Kill all matching

# Info
lsof -p PID              # Open files
strace -p PID            # System calls
cat /proc/PID/status     # Process status

# Resources
nice -n 10 cmd           # Run with low priority
renice 10 -p PID         # Change priority
ulimit -a                # View limits

Process management is detective work. Start with the overview (ps, top), drill down to specifics (lsof, /proc), and trace execution when needed (strace).

The goal isn’t to memorize every flag—it’s to know which tool answers which question. “What’s using all my CPU?” → top. “What files does this process have open?” → lsof. “Why is this process hanging?” → strace.