Weight Averaging

Share

A post-training method that takes the best model weights across the training and averages them into a single model. By doing so, we overcome the optimization tendency to alternate between adjacent local minimas in the later stages of the training.

This trick doesn’t affect the training whatsoever, other than keeping a few additional weights on the disk, and can yield a substantial boost in performance and stability.

Filter terms by

Related resources

deci-updates-tensorboard-blog-featured-2
Open Source
pytorch-training-sg-new-features-featured-2
Open Source
new-hardware-support-featured
Engineering