Batch Accumulation


When you use a model ‘off the shelf,’ it generally comes with a suggested training recipe. The thing is, these models are usually trained on very powerful GPUs, which may mean the recipe is not necessarily appropriate for your target hardware. Reducing the batch size to accommodate your hardware will likely require tuning other parameters as well and you won’t always get the same training results.

To overcome this issue, you can perform several consecutive forward steps over the model, accumulate the gradients, and backpropagate them once every few batches. This mechanism is known as batch accumulation.

Filter terms by

Glossary Alphabetical filter

Related resources

featured image for how to measure inference time
Computer Vision
Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")