Federated Learning (FL) trains a shared model across many devices or organisations — each keeping their data local. Google uses it to improve Gboard next-word prediction without reading anyone's messages. Hospitals use it to train diagnostic models across institutions without sharing patient records.
The FedAvg Algorithm
The core algorithm is simple. In each round:
- Server sends current global model weights to a subset of clients.
- Each client trains on its local data for E local epochs.
- Clients send their weight updates (gradients or deltas) back to server.
- Server aggregates updates — typically a weighted average by dataset size.
- Repeat until convergence.
Python (FedAvg Server)import numpy as np def federated_average(client_updates: list, client_sizes: list) -> np.ndarray: """Weighted average of client model updates.""" total = sum(client_sizes) aggregated = np.zeros_like(client_updates[0]) for update, size in zip(client_updates, client_sizes): aggregated += (size / total) * update return aggregated # Server round global_weights = initialise_model() for round_num in range(100): selected = select_clients(fraction=0.1) updates = [client.train(global_weights, epochs=5) for client in selected] sizes = [client.dataset_size for client in selected] global_weights = federated_average(updates, sizes) print(f"Round {round_num}: global model updated")
Real-World Challenges
📊
Non-IID Data
Client data distributions differ wildly. FedProx and SCAFFOLD algorithms handle heterogeneous data better than vanilla FedAvg.
📡
Communication Cost
Sending full model weights each round is expensive. Gradient compression and model quantisation reduce bandwidth by 100x.
🔐
Gradient Leakage
Gradients can leak training data. Combine FL with Differential Privacy or Secure Aggregation for true privacy.
📴
Stragglers
Slow or offline clients block synchronous rounds. Asynchronous FL or client selection strategies mitigate this.
Healthcare Use Case
NVIDIA FLARE enables hospitals to collaboratively train tumour segmentation models. No patient data leaves the institution. The joint model outperforms any single-institution model by 15–20%.