You have a list of files. You need to process each one. The naive approach:
1
2
3
| for file in $(cat files.txt); do
process "$file"
done
|
This works until it doesn’t — filenames with spaces break it, and it’s sequential. Enter xargs.
The Basics#
xargs reads input and converts it into arguments for a command:
1
2
3
4
5
| # Delete files listed in a file
cat files.txt | xargs rm
# Same thing, more efficient
xargs rm < files.txt
|
Without xargs, you’d need a loop. With xargs, one line.
Handling Special Characters#
Filenames have spaces? Newlines? Use null delimiters:
1
2
3
4
5
| # Find + xargs with null separator
find . -name "*.log" -print0 | xargs -0 rm
# Read null-terminated input
cat files.txt | tr '\n' '\0' | xargs -0 process
|
The -print0 and -0 combo is bulletproof. Always use it with find.
Controlling Argument Placement#
By default, xargs appends arguments at the end. Use -I to place them anywhere:
1
2
3
4
5
6
7
8
| # Rename files with a prefix
ls *.txt | xargs -I {} mv {} backup_{}
# Copy files to a directory
find . -name "*.conf" | xargs -I {} cp {} /backup/configs/
# Use with curl
cat urls.txt | xargs -I {} curl -O {}
|
The {} is a placeholder — each input line replaces it.
Batch Size Control#
Process arguments in batches with -n:
1
2
3
4
5
6
7
8
9
10
| # Delete 10 files at a time
find . -name "*.tmp" -print0 | xargs -0 -n 10 rm
# Echo shows the batching
echo {1..20} | xargs -n 5 echo
# Output:
# 1 2 3 4 5
# 6 7 8 9 10
# 11 12 13 14 15
# 16 17 18 19 20
|
Useful when commands have argument limits or you want progress visibility.
Parallel Execution#
This is where xargs shines. Use -P for parallel processes:
1
2
3
4
5
6
7
8
| # Process 4 files simultaneously
find . -name "*.jpg" -print0 | xargs -0 -P 4 -I {} convert {} -resize 50% resized_{}
# Download URLs in parallel
cat urls.txt | xargs -P 8 -I {} curl -sO {}
# Compress files with all cores
find . -name "*.log" -print0 | xargs -0 -P $(nproc) gzip
|
-P 0 means unlimited parallelism (use with caution).
Confirmation Before Execution#
Ask before each command with -p:
1
2
3
| # Confirm each deletion
find . -name "*.bak" | xargs -p rm
# rm ./old.bak?...y
|
For dry runs, use echo:
1
2
| # See what would run
find . -name "*.tmp" | xargs echo rm
|
With grep#
1
2
3
4
5
| # Find files containing "TODO" and show matches
find . -name "*.py" | xargs grep -l "TODO"
# Count TODOs per file
find . -name "*.py" -print0 | xargs -0 grep -c "TODO" | grep -v ":0$"
|
With sed#
1
2
| # Replace text in multiple files
find . -name "*.txt" | xargs sed -i 's/old/new/g'
|
With ssh#
1
2
| # Run command on multiple hosts
echo "host1 host2 host3" | tr ' ' '\n' | xargs -P 3 -I {} ssh {} "uptime"
|
With docker#
1
2
3
4
5
| # Stop all running containers
docker ps -q | xargs docker stop
# Remove old images
docker images -q --filter "dangling=true" | xargs docker rmi
|
Building Complex Pipelines#
1
2
3
4
5
6
7
| # Find large files, sort by size, take top 10, show details
find /var/log -type f -size +10M -print0 | \
xargs -0 ls -lhS | \
head -10
# Process CSV: extract column, dedupe, count occurrences
cut -d',' -f2 data.csv | sort | uniq | xargs -I {} sh -c 'echo -n "{}: "; grep -c "{}" data.csv'
|
Handling Command Failures#
By default, xargs continues after failures. Change this:
1
2
| # Stop on first failure
find . -name "*.sh" -print0 | xargs -0 --halt-on-error=1 bash -c 'shellcheck "$@"' _
|
Or capture exit codes:
1
2
| # Process and track failures
find . -name "*.test" -print0 | xargs -0 -I {} sh -c './run_test {} || echo "FAILED: {}"'
|
Reduce Process Spawning#
1
2
3
4
5
| # Bad: spawns 'echo' for each file
find . -name "*.txt" | xargs -n 1 echo "Processing:"
# Good: batches into fewer commands
find . -name "*.txt" | xargs echo "Processing:"
|
Limit Line Length#
Some systems have argument length limits. Use -s to set max command length:
1
| find . -name "*.log" | xargs -s 10000 rm
|
Use Built-in Parallelism#
Many commands have their own parallel options:
1
2
3
4
5
| # Instead of: find ... | xargs -P 4 gzip
# Use: find ... | xargs gzip --parallel=4
# Or pigz for better parallel gzip
find . -name "*.log" -print0 | xargs -0 pigz -p 4
|
Common Patterns#
Backup Before Modify#
1
| find . -name "*.conf" -print0 | xargs -0 -I {} sh -c 'cp {} {}.bak && process {}'
|
Process With Index#
1
| find . -name "*.jpg" | nl | xargs -n 2 sh -c 'mv "$2" "image_$1.jpg"' _
|
Conditional Execution#
1
2
| # Only process if target doesn't exist
find . -name "*.md" | xargs -I {} sh -c '[ ! -f "{}.html" ] && pandoc {} -o {}.html'
|
Quick Reference#
| Flag | Purpose | Example |
|---|
-0 | Null-delimited input | find -print0 | xargs -0 |
-I {} | Placeholder for arguments | xargs -I {} mv {} /dest/ |
-n N | N arguments per command | xargs -n 2 |
-P N | N parallel processes | xargs -P 4 |
-p | Prompt before execution | xargs -p rm |
-t | Print commands before running | xargs -t |
-r | Don’t run if input is empty | xargs -r rm |
When Not to Use xargs#
- Simple cases:
find . -name "*.tmp" -delete beats find | xargs rm - Complex logic: Use a proper script instead of
sh -c chains - Already parallel tools:
parallel (GNU Parallel) is more powerful for complex parallelism
xargs is for the sweet spot: batch operations where a loop is too slow and a full script is overkill.
Computing Arts is CLI craft for the modern practitioner. More at computingarts.com.