Zero-Weight Decay on BatchNorm and Bias

Share

The weight decay is a regularization parameter that prevents the model weights from ‘exploding’. Zeroing the weight decay for these parameters is usually done by default in various projects and frameworks, but it’s still worth checking since it is still not the default behavior for Pytorch.

Weight decay essentially pulls the weights towards 0. While this is beneficial for convolutional and linear layer weights, Batchnorm layer parameters are meant to scale (the gamma parameter) and shift (the beta parameter) the normalized input of the layer. As such, forcing these values to a lower value would affect the distribution and result in inferior results.

Filter terms by

Glossary Alphabetical filter

Related resources

sg-w&b-integration
Training
featured image for how to measure inference time
Deployment
resnet50-how-to-achieve-SOTA-accuracy-on-imagenet
Computer Vision
Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")