Federated
Learning
A model trained across many devices, each keeping its data local. The server only sees the gradients — never the raw data. The fix when "send everything to the cloud" is illegal, expensive, or just impolite.
In one sentence: data stays local, gradients move. Each device trains on its own data; only the model updates leave.
The protocol — FedAvg: (1) server broadcasts current global weights to a sample of clients; (2) each client trains locally on its own data for a few epochs; (3) clients send the weight updates (not the data) back to the server; (4) server averages the updates, weighted by data size, into new global weights. Repeat for hundreds of rounds.
Variants add differential privacy (noise injected into each update so individual rows can't be reverse-engineered) and secure aggregation (the server can decrypt only the sum, not the individual contributions).
Federated learning made on-device personalization viable. Gboard's next-word prediction, Apple's QuickType, and Siri voice models are all federated — your typing data never leaves your phone, but your phone contributes to the global model.
It's also the regulatory escape hatch for healthcare, finance, and HR ML: GDPR, HIPAA, and similar laws often forbid pooling raw data across jurisdictions. Federated training keeps each hospital's patient records on its servers while still building a shared model.
- Hit Run rounds. Watch the broadcast phase: white packets fan out from the server to each client (the global weights). Then the upload phase: orange gradient packets travel back. The server averages (a single white packet emerges) and the loss curve drops.
- Toggle non-IID data. Each client's local data histogram now skews toward a different class (device 1 mostly class A, device 2 mostly class B). Convergence slows; the loss curve gets noisier — a real-world federated pain point.
- Turn up privacy noise (DP-σ). Notice the small jitter added to gradient packets before they leave each client; the global loss curve becomes noisier as σ rises. The privacy-utility tradeoff, made visible.
- Raise client dropout. Some clients gray out with an X each round — offline phones, dead laptops. FedAvg still converges; robustness is one of the algorithm's quiet strengths.
- w · the global model weights — a long vector. Same architecture across all devices.
- wkt+1 · client k's local weights after this round of local training (starting from the broadcast wt).
- nk · client k's local data size (how many examples that device trained on).
- n · total data across all participating clients this round (= Σ nk).
- The recipe: the server averages the clients' new weights, but weights with more data get more say. A client with 10× the data has 10× the influence.
Google's keyboard learns from billions of phones without ever uploading your text. Each phone trains locally; encrypted gradient summaries are aggregated server-side. Federated learning at industrial scale.
iPhone's predictive text and Siri's wake-word personalization use a federated mix with differential privacy. Apple's marketing leans on the fact that the data never leaves your device.
Multi-hospital tumor classification models — each hospital keeps patient scans local; federated rounds aggregate the model. Used by NVIDIA Clara, the Brain Tumor Federated Learning challenge, and several FDA-cleared products.
Banks have patterns of fraud they can't share with competitors. Federated learning lets them collaboratively train fraud models without exposing customer transactions. Used by SWIFT and several large banking consortia.
- Communication-Efficient Learning of Deep Networks from Decentralized Data paper McMahan et al. (2017) · The FedAvg paper. Introduced federated learning to mainstream ML and remains the foundational reference.
- Advances and Open Problems in Federated Learning survey Kairouz et al. (2021) · A 100+ page survey from a Google + academia consortium. The reference for both practical and research aspects.
- TensorFlow Federated library Google's open-source framework for federated learning experiments. Includes simulation tools and DP integration.
- Flower library Framework-agnostic federated learning. PyTorch + JAX + scikit-learn support; the modern alternative to TFF for research and production.