Log Aggregation Pipelines: From Scattered Files to Searchable Insights
Building log pipelines that scale - collection, processing, storage, and actually finding what you need at 3 AM.
February 24, 2026 · 10 min · 1999 words · Rob Washington
Table of Contents
When you have one server, you SSH in and grep the logs. When you have fifty servers, that stops working. Log aggregation is how you make “what happened?” answerable at scale.
A logging sidecar reads from stdout/stderr or shared volumes.
1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion:v1kind:Podspec:containers:- name:appimage:myapp:1.0# Logs to stdout- name:fluentd-sidecarimage:fluent/fluentd:latestvolumeMounts:- name:varlogmountPath:/var/log
In Kubernetes, the standard is logging to stdout and letting the node-level agent (Fluentd DaemonSet, Promtail, etc.) collect from the container runtime.
// Index template for logs
{"index_patterns":["logs-*"],"template":{"settings":{"number_of_shards":3,"number_of_replicas":1,"index.lifecycle.name":"logs-policy"},"mappings":{"properties":{"@timestamp":{"type":"date"},"level":{"type":"keyword"},"service":{"type":"keyword"},"message":{"type":"text"},"trace_id":{"type":"keyword"},"duration_ms":{"type":"integer"}}}}}
Column-oriented database, excellent for log analytics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATETABLElogs(timestampDateTime,levelLowCardinality(String),serviceLowCardinality(String),messageString,trace_idString,duration_msUInt32)ENGINE=MergeTree()PARTITIONBYtoYYYYMM(timestamp)ORDERBY(service,timestamp);-- Fast aggregations
SELECTservice,count()aserrors,avg(duration_ms)asavg_durationFROMlogsWHERElevel='error'ANDtimestamp>now()-INTERVAL1HOURGROUPBYservice;
Pros: Blazing fast analytics, compression
Cons: Not designed for full-text search
# Elasticsearch: separate indices per tenantoutput.elasticsearch:index:"logs-%{[tenant]}-%{+yyyy.MM.dd}"# Loki: tenant headerclients:- url:http://loki:3100/loki/api/v1/pushtenant_id:"${TENANT_ID}"
Storage: Loki (simple) or Elasticsearch (powerful)
Query: Grafana
Add complexity (Kafka buffer, Logstash processing) only when you need it.
Log aggregation isn’t glamorous, but it’s what lets you answer “what happened?” when things go wrong. Invest in structured logging from day one, pick a stack that matches your scale, and remember: the best log pipeline is the one you can actually operate.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.