Generalisation, Averaging Schemes, Privacy
Essential Bibliography
Analysis of generalisation within the Teacher/Student setup
Original paper introducing the teacher/student framework for generalisation (and gradient flow) analysis:
Also the basis of more modern works on generalisation - mostly in the continual learning setting, e.g.:
Stochastic Weight Averaging
Original paper introducing Stochastic Weight Averaging as a generalisation-enhancing training scheme:
Improvement upon SWA with clever weight sampling and learning rate scheduling:
Re-purposing SWA as a surrogate-Bayesian uncertainty estimation tool:
Paper casting constant-LR SGD as sampling. Interesting from the theoretical viewpoint.
Differentially-private Stochastic Gradient Descent
Original paper introducing the method:
The original paper introducing the link between algorithmic stability and generalisation (note: DP-SGD should be algorithmically stable!):
And a recent take on the problem, for adaptive optimisers:
Workshop paper investigating inter-relationship among SWA, DP-SGD and generalisation. Theory up to non-noisy quadratic objectives.
Flat/sharp minima in loss landscapes
Papers supporting the flatness
A paper casting doubts on the flatness
A famous optimiser (SAM) directly integrating sharpness-awareness in the optimisation process, at the cost of a double backward pass:
And a paper aiming at deeply understanding how it works:
On the limits of wide minima optimisers:
Local gradient/weight averaging schemes (at optimisation-time)
The paper that started them all (Polyak-Ruppert averaging):
- Polyak, Juditsky; 1992
(or the technical report from Ruppert; 1988)
The original Lookahead optimiser:
The new Lookaround optimiser:
- Zhang*, Liu, Song, Zhu, Xu, Song; 2023
(* a different Zhang)
Implementations
The reference PyTorch implementation of DP-SGD can be found as part of the Opacus Projectthat is citable as:
As far as optimisers are concerned - due to some unexplained very peculiarity - it appears that they become unmaintained after a while from publication (Lookahead, SAM) or are based on an old software stack (Lookaround). To ease the situation, I have minimally bugfixed and re-published them.
They should be available as
from ebtorch.optim import Lookahead, Lookaround, SAM
after a simple
pip install ebtorch
The Lookahead one is fairly battle-tested, the Lookaround one is still very experimental, the SAM one closely matches the original and should be OK to use.