When you need to process video in real-time — whether it’s tracking objects, detecting anomalies, or analyzing motion — you face a fundamental choice: cloud or edge? Here’s why edge ML is often the better answer, and how to make it work.

Why Edge?

Latency matters. A round-trip to the cloud takes 50-200ms minimum. For real-time tracking, that’s an eternity. Edge processing can hit single-digit milliseconds.

Bandwidth is expensive. Streaming raw video at 720p/60fps burns ~100 Mbps. Processing locally and sending only results drops that to kilobytes.

Reliability. No network? Edge keeps working. Cloud connection dies? Edge keeps working.

Privacy. Some use cases (security cameras, medical devices) can’t send raw footage off-premises.

Hardware Options

Raspberry Pi 5 (~$80)

Surprisingly capable for basic CV tasks:

  • 2.4 GHz quad-core ARM Cortex-A76
  • Can run YOLO-lite models at 5-10 fps
  • Limited by memory bandwidth for large models

Best for: Simple detection, basic tracking, prototypes.

NVIDIA Jetson Orin Nano (~$500)

The sweet spot for serious edge ML:

  • 40 TOPS of AI performance
  • Hardware accelerated video encode/decode
  • Runs full YOLO models at 30+ fps

Best for: Production deployments, multi-camera setups, complex pipelines.

Coral TPU (~$60 add-on)

USB accelerator for Pi or any Linux box:

  • 4 TOPS, optimized for TensorFlow Lite
  • Can boost Pi to 30+ fps on compatible models
  • Requires model quantization (int8)

Best for: Upgrading existing hardware cheaply.

The Software Stack

Detection: YOLO Family

YOLOv8 and its variants remain the go-to for real-time detection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from ultralytics import YOLO

# Load model (nano version for edge)
model = YOLO('yolov8n.pt')

# Run inference
results = model(frame)

for box in results[0].boxes:
    cls = int(box.cls[0])
    conf = float(box.conf[0])
    x1, y1, x2, y2 = map(int, box.xyxy[0])

Model sizes matter:

  • YOLOv8n (nano): 3.2M params, fastest
  • YOLOv8s (small): 11.2M params, good balance
  • YOLOv8m (medium): 25.9M params, more accurate

Tracking: ByteTrack / DeepSORT

Detection finds objects per-frame. Tracking maintains identity across frames:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from supervision import ByteTrack

tracker = ByteTrack()

# After detection
detections = model(frame)
tracked = tracker.update_with_detections(detections)

for track in tracked:
    track_id = track.tracker_id  # Persistent ID

Optimization: TensorRT

For Jetson, TensorRT compilation can double performance:

1
2
3
4
5
# Export to TensorRT
model.export(format='engine', device='0')

# Load optimized model
model_trt = YOLO('yolov8n.engine')

Architecture Patterns

Pipeline Pattern

Separate concerns for better throughput:

CamIenrfPaeorsetn-CcpaerpotTcuherrseesaTdThhrreeaRadedsulFtOrsuatmQpeuuetQuueeue

Each stage runs independently. Capture doesn’t block on inference.

Multi-Model Cascade

Run cheap detection first, expensive analysis only when needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Fast: "Is there anything here?"
motion = detect_motion(frame)  # ~1ms

if motion:
    # Medium: "What kind of object?"
    detections = quick_detect(frame)  # ~10ms
    
    if detections.has_target_class():
        # Slow: "Detailed analysis"
        analysis = deep_analyze(frame)  # ~50ms

Frame Skipping

Not every frame needs processing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
frame_count = 0
while True:
    frame = camera.read()
    frame_count += 1
    
    # Process every 3rd frame
    if frame_count % 3 == 0:
        results = model(frame)
    
    # But display every frame (smoother video)
    display(frame, last_results)

Common Pitfalls

Memory Leaks

OpenCV and deep learning frameworks love leaking memory:

1
2
3
4
5
6
7
8
# Bad: Creates new array each time
while True:
    frame = cv2.resize(camera.read(), (640, 480))

# Better: Reuse buffer
buffer = np.zeros((480, 640, 3), dtype=np.uint8)
while True:
    cv2.resize(camera.read(), (640, 480), dst=buffer)

Thermal Throttling

Edge devices get hot. Monitor and adapt:

1
2
3
4
5
6
7
def get_cpu_temp():
    with open('/sys/class/thermal/thermal_zone0/temp') as f:
        return int(f.read()) / 1000

if get_cpu_temp() > 80:
    # Reduce inference rate or switch to lighter model
    time.sleep(0.1)

Color Space Confusion

OpenCV uses BGR. Everything else uses RGB. Always convert:

1
2
3
4
5
6
7
8
# OpenCV capture is BGR
frame_bgr = camera.read()

# Models expect RGB
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)

# PIL/display expects RGB
# Saving with cv2.imwrite() expects BGR

Real-World Performance

Tested on Jetson Orin Nano with 720p input:

ModelFPSAccuracy
YOLOv8n45Good for large objects
YOLOv8s32Better for small objects
YOLOv8n + ByteTrack38Smooth tracking
YOLOv8s + TensorRT48Best balance

When to Go Cloud

Edge isn’t always the answer:

  • Training — Always cloud. Edge is inference only.
  • Rare events — If you process one image per hour, cloud is simpler.
  • Complex reasoning — LLM-based analysis needs cloud power.
  • Batch processing — No latency requirement? Cloud scales better.

Getting Started

  1. Start with Pi + USB camera (~$100 total). Prove your concept works.
  2. Hit limits? Add Coral TPU (~$60) for 3-5x speedup.
  3. Need more? Graduate to Jetson Orin Nano.
  4. Multi-camera? Jetson Orin NX or AGX.

The best edge ML system is the one that’s actually deployed. Start simple, optimize when you have real performance data.


Edge ML brings AI out of the data center and into the real world. The tools are mature, the hardware is affordable, and the use cases are endless. What will you build?