You can grep for lines and cut for columns. But what about “show me the third column of lines containing ERROR, but only if the second column is greater than 100”?
That’s awk territory.
The Basics#
awk processes text line by line, splitting each into fields:
1
2
3
4
5
6
7
8
9
| # Print second column (space-delimited by default)
echo "hello world" | awk '{print $2}'
# world
# Print first and third columns
cat data.txt | awk '{print $1, $3}'
# Print entire line
awk '{print $0}' file.txt
|
$1, $2, etc. are fields. $0 is the whole line. NF is the number of fields. NR is the line number.
Custom Delimiters#
1
2
3
4
5
6
7
8
9
10
11
| # Colon-separated (like /etc/passwd)
awk -F: '{print $1, $3}' /etc/passwd
# CSV (careful with quoted fields)
awk -F, '{print $2}' data.csv
# Multiple delimiters
awk -F'[,;:]' '{print $1}' mixed.txt
# Tab-delimited
awk -F'\t' '{print $1}' data.tsv
|
Conditions#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # Lines where column 3 > 100
awk '$3 > 100' data.txt
# Lines containing "ERROR"
awk '/ERROR/' logfile.txt
# Combine conditions
awk '$3 > 100 && /ERROR/' data.txt
# Column equals specific value
awk '$2 == "active"' status.txt
# Regex match on column
awk '$1 ~ /^server/' config.txt
|
BEGIN and END Blocks#
1
2
3
4
5
6
7
8
| # Header and footer
awk 'BEGIN {print "Name\tScore"} {print $1, $2} END {print "---done---"}' data.txt
# Initialize variables
awk 'BEGIN {count=0} /ERROR/ {count++} END {print count " errors"}' log.txt
# Set delimiter in BEGIN
awk 'BEGIN {FS=":"} {print $1}' /etc/passwd
|
Built-in Variables#
| Variable | Meaning |
|---|
$0 | Entire line |
$1, $2... | Fields |
NF | Number of fields |
NR | Line number (all files) |
FNR | Line number (current file) |
FS | Field separator |
OFS | Output field separator |
RS | Record separator |
1
2
3
4
5
6
7
8
9
10
11
| # Print line numbers
awk '{print NR, $0}' file.txt
# Print last column
awk '{print $NF}' file.txt
# Print second-to-last column
awk '{print $(NF-1)}' file.txt
# Skip header line
awk 'NR > 1 {print $1}' data.csv
|
Calculations#
1
2
3
4
5
6
7
8
9
10
11
| # Sum a column
awk '{sum += $3} END {print sum}' sales.txt
# Average
awk '{sum += $3; count++} END {print sum/count}' data.txt
# Min/Max
awk 'NR==1 {max=$3} $3>max {max=$3} END {print max}' data.txt
# Running total
awk '{sum += $2; print $1, sum}' transactions.txt
|
String Functions#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # Length of field
awk '{print $1, length($1)}' names.txt
# Substring
awk '{print substr($1, 1, 3)}' data.txt # First 3 chars
# Find position
awk '{print index($0, "error")}' log.txt
# Split into array
awk '{split($0, parts, ":"); print parts[1]}' data.txt
# Uppercase/lowercase (GNU awk)
awk '{print toupper($1)}' names.txt
|
1
2
3
4
5
| # Aligned columns
awk '{printf "%-20s %10.2f\n", $1, $2}' data.txt
# Fixed-width output
awk '{printf "%05d %s\n", NR, $0}' file.txt
|
Format specifiers: %s (string), %d (integer), %f (float), %- (left-align).
Real-World Patterns#
Log Analysis#
1
2
3
4
5
6
7
8
| # Count HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn
# Better: do it all in awk
awk '{codes[$9]++} END {for (c in codes) print codes[c], c}' access.log | sort -rn
# Average response time by endpoint
awk '{times[$7] += $10; counts[$7]++} END {for (e in times) print e, times[e]/counts[e]}' access.log
|
Process Monitoring#
1
2
3
4
5
| # Memory usage by process name
ps aux | awk '{mem[$11] += $6} END {for (p in mem) print mem[p], p}' | sort -rn | head
# CPU > 50%
ps aux | awk '$3 > 50 {print $11, $3"%"}'
|
CSV Processing#
1
2
3
4
5
| # Sum column 3 where column 1 is "sales"
awk -F, '$1 == "sales" {sum += $3} END {print sum}' data.csv
# Filter and reformat
awk -F, 'NR > 1 && $4 > 1000 {print $1 ": $" $4}' transactions.csv
|
Disk Usage#
1
2
3
4
5
| # Total size of files by extension
find . -type f -name "*.*" | awk -F. '{ext[$NF]++} END {for (e in ext) print ext[e], e}' | sort -rn
# Or with sizes
ls -l *.log | awk '{sum += $5} END {print sum/1024/1024 " MB"}'
|
Configuration Parsing#
1
2
3
4
5
| # Extract values from key=value files
awk -F= '/^database_host/ {print $2}' config.ini
# Ignore comments and blank lines
awk '!/^#/ && !/^$/ {print}' config.txt
|
Multi-file Processing#
1
2
3
4
5
6
7
8
| # Print filename with each line
awk '{print FILENAME, $0}' *.log
# Reset line count per file
awk 'FNR == 1 {print "--- " FILENAME " ---"} {print}' file1.txt file2.txt
# Compare files
awk 'NR==FNR {a[$1]; next} $1 in a' file1.txt file2.txt
|
Arrays#
1
2
3
4
5
6
7
8
| # Count occurrences
awk '{counts[$1]++} END {for (k in counts) print k, counts[k]}' data.txt
# Deduplicate
awk '!seen[$0]++' file.txt
# Join lines by key
awk '{data[$1] = data[$1] " " $2} END {for (k in data) print k, data[k]}' pairs.txt
|
One-Liners Worth Memorizing#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # Print unique lines (preserving order)
awk '!seen[$0]++' file.txt
# Print lines between patterns
awk '/START/,/END/' file.txt
# Remove blank lines
awk 'NF' file.txt
# Print every nth line
awk 'NR % 5 == 0' file.txt
# Reverse columns
awk '{for (i=NF; i>0; i--) printf "%s ", $i; print ""}' file.txt
# Sum column and print with total
awk '{sum += $2; print} END {print "Total:", sum}' data.txt
|
When to Use awk vs. Alternatives#
| Task | Best Tool |
|---|
| Simple pattern search | grep |
| Extract single column | cut |
| Column operations with conditions | awk |
| Complex transformations | awk or Python |
| JSON processing | jq |
| CSV with proper parsing | csvkit or Python |
awk hits the sweet spot: more powerful than grep/cut, simpler than a full script.
Quick Reference#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # Structure
awk 'pattern {action}' file
# Common patterns
/regex/ # Lines matching regex
$1 == "value" # Field equals value
$2 > 100 # Numeric comparison
NR > 1 # Skip first line
NR == 1, NR == 10 # Lines 1-10
# Common actions
{print $1, $2} # Print fields
{sum += $1} # Accumulate
{count++} # Count
{arr[$1]++} # Count by key
|
awk is a complete programming language disguised as a CLI tool. You don’t need to master all of it — just enough to solve the problem grep can’t.
Computing Arts is CLI fluency for practitioners. More at computingarts.com.