Home / Zero-Weight Decay on BatchNorm and Bias

Zero-Weight Decay on BatchNorm and Bias

The weight decay is a regularization parameter that prevents the model weights from ‘exploding’. Zeroing the weight decay for these parameters is usually done by default in various projects and frameworks, but it’s still worth checking since it is still not the default behavior for Pytorch.

Weight decay essentially pulls the weights towards 0. While this is beneficial for convolutional and linear layer weights, Batchnorm layer parameters are meant to scale (the gamma parameter) and shift (the beta parameter) the normalized input of the layer. As such, forcing these values to a lower value would affect the distribution and result in inferior results.

Related resources

Training

featured image for how to measure inference time

Deployment

resnet50-how-to-achieve-SOTA-accuracy-on-imagenet

Computer Vision

Add Your Heading Text Here

				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")

Zero-Weight Decay on BatchNorm and Bias

Related resources

Share

Add Your Heading Text Here