Generalisation, Averaging Schemes, Privacy

Essential Bibliography

Analysis of generalisation within the Teacher/Student setup

Original paper introducing the teacher/student framework for generalisation (and gradient flow) analysis:

Also the basis of more modern works on generalisation - mostly in the continual learning setting, e.g.:


Stochastic Weight Averaging

Original paper introducing Stochastic Weight Averaging as a generalisation-enhancing training scheme:

Improvement upon SWA with clever weight sampling and learning rate scheduling:

Re-purposing SWA as a surrogate-Bayesian uncertainty estimation tool:

Paper casting constant-LR SGD as sampling. Interesting from the theoretical viewpoint.


Original paper introducing the method:

The original paper introducing the link between algorithmic stability and generalisation (note: DP-SGD should be algorithmically stable!):

And a recent take on the problem, for adaptive optimisers:

Workshop paper investigating inter-relationship among SWA, DP-SGD and generalisation. Theory up to non-noisy quadratic objectives.


Flat/sharp minima in loss landscapes

Papers supporting the flatness = generalisation hypothetical equivalence:

A paper casting doubts on the flatness = generalisation hypothetical equivalence:

A famous optimiser (SAM) directly integrating sharpness-awareness in the optimisation process, at the cost of a double backward pass:

And a paper aiming at deeply understanding how it works:

On the limits of wide minima optimisers:


Local gradient/weight averaging schemes (at optimisation-time)

The paper that started them all (Polyak-Ruppert averaging):

The original Lookahead optimiser:

The new Lookaround optimiser:


Implementations

The reference PyTorch implementation of DP-SGD can be found as part of the Opacus Projectthat is citable as:

As far as optimisers are concerned - due to some unexplained very peculiarity - it appears that they become unmaintained after a while from publication (Lookahead, SAM) or are based on an old software stack (Lookaround). To ease the situation, I have minimally bugfixed and re-published them.

They should be available as

from ebtorch.optim import Lookahead, Lookaround, SAM

after a simple

pip install ebtorch

The Lookahead one is fairly battle-tested, the Lookaround one is still very experimental, the SAM one closely matches the original and should be OK to use.