Batch Accumulation

Share

When you use a model ‘off the shelf,’ it generally comes with a suggested training recipe. The thing is, these models are usually trained on very powerful GPUs, which may mean the recipe is not necessarily appropriate for your target hardware. Reducing the batch size to accommodate your hardware will likely require tuning other parameters as well and you won’t always get the same training results.

To overcome this issue, you can perform several consecutive forward steps over the model, accumulate the gradients, and backpropagate them once every few batches. This mechanism is known as batch accumulation.

Filter terms by

Glossary Alphabetical filter

Related resources

sg-w&b-integration
Training
featured image for how to measure inference time
Deployment
resnet50-how-to-achieve-SOTA-accuracy-on-imagenet
Computer Vision
Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")