When you need to process video in real-time — whether it’s tracking objects, detecting anomalies, or analyzing motion — you face a fundamental choice: cloud or edge? Here’s why edge ML is often the better answer, and how to make it work.
Why Edge?
Latency matters. A round-trip to the cloud takes 50-200ms minimum. For real-time tracking, that’s an eternity. Edge processing can hit single-digit milliseconds.
Bandwidth is expensive. Streaming raw video at 720p/60fps burns ~100 Mbps. Processing locally and sending only results drops that to kilobytes.
Reliability. No network? Edge keeps working. Cloud connection dies? Edge keeps working.
Privacy. Some use cases (security cameras, medical devices) can’t send raw footage off-premises.
Hardware Options
Raspberry Pi 5 (~$80)
Surprisingly capable for basic CV tasks:
- 2.4 GHz quad-core ARM Cortex-A76
- Can run YOLO-lite models at 5-10 fps
- Limited by memory bandwidth for large models
Best for: Simple detection, basic tracking, prototypes.
NVIDIA Jetson Orin Nano (~$500)
The sweet spot for serious edge ML:
- 40 TOPS of AI performance
- Hardware accelerated video encode/decode
- Runs full YOLO models at 30+ fps
Best for: Production deployments, multi-camera setups, complex pipelines.
Coral TPU (~$60 add-on)
USB accelerator for Pi or any Linux box:
- 4 TOPS, optimized for TensorFlow Lite
- Can boost Pi to 30+ fps on compatible models
- Requires model quantization (int8)
Best for: Upgrading existing hardware cheaply.
The Software Stack
Detection: YOLO Family
YOLOv8 and its variants remain the go-to for real-time detection:
| |
Model sizes matter:
- YOLOv8n (nano): 3.2M params, fastest
- YOLOv8s (small): 11.2M params, good balance
- YOLOv8m (medium): 25.9M params, more accurate
Tracking: ByteTrack / DeepSORT
Detection finds objects per-frame. Tracking maintains identity across frames:
| |
Optimization: TensorRT
For Jetson, TensorRT compilation can double performance:
| |
Architecture Patterns
Pipeline Pattern
Separate concerns for better throughput:
Each stage runs independently. Capture doesn’t block on inference.
Multi-Model Cascade
Run cheap detection first, expensive analysis only when needed:
| |
Frame Skipping
Not every frame needs processing:
| |
Common Pitfalls
Memory Leaks
OpenCV and deep learning frameworks love leaking memory:
| |
Thermal Throttling
Edge devices get hot. Monitor and adapt:
| |
Color Space Confusion
OpenCV uses BGR. Everything else uses RGB. Always convert:
| |
Real-World Performance
Tested on Jetson Orin Nano with 720p input:
| Model | FPS | Accuracy |
|---|---|---|
| YOLOv8n | 45 | Good for large objects |
| YOLOv8s | 32 | Better for small objects |
| YOLOv8n + ByteTrack | 38 | Smooth tracking |
| YOLOv8s + TensorRT | 48 | Best balance |
When to Go Cloud
Edge isn’t always the answer:
- Training — Always cloud. Edge is inference only.
- Rare events — If you process one image per hour, cloud is simpler.
- Complex reasoning — LLM-based analysis needs cloud power.
- Batch processing — No latency requirement? Cloud scales better.
Getting Started
- Start with Pi + USB camera (~$100 total). Prove your concept works.
- Hit limits? Add Coral TPU (~$60) for 3-5x speedup.
- Need more? Graduate to Jetson Orin Nano.
- Multi-camera? Jetson Orin NX or AGX.
The best edge ML system is the one that’s actually deployed. Start simple, optimize when you have real performance data.
Edge ML brings AI out of the data center and into the real world. The tools are mature, the hardware is affordable, and the use cases are endless. What will you build?