Description
EfficientNet is a convolutional neural network (CNN) architecture pre-trained on CIFAR-10 and CIFAR-100, Birdsnap, Stanford Cars, Flowers, FGVC Aircraft, Oxford-IIIT Pets, and Food-101 datasets.
Publishers
Mingxing Tan, Quoc V. Le, in the paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”
Submitted Version
May 28, 2019
Latest Version
September 11, 2020
Size
5.3M to 66M
The authors developed a baseline network by leveraging a multi-objective neural architecture search that optimizes accuracy and FLOPS. Flops are optimized rather than latency because no specific hardware device is a target. The resultant network is efficient, and hence it is named EfficientNet. While researchers found that accuracy improved with increasing network breadth, depth, or resolution, they also found that this benefit waned with increasing model size. Better accuracy and efficiency can be achieved by ConvNet scaling by striking a balance between the network’s width, depth, and resolution.
EfficientNet models are trained on ImageNet by applying RMSProp optimizer with decay 0.9 and momentum 0.9; batch norm momentum 0.99, weight decay 1e-5; initial learning rate 0.256 that decays by 0.97 every 2.4 epochs. The model also uses SiLU (Swish-1) activation, AutoAugment, and stochastic depth with a survival probability of 0.8. The EfficientNet model’s dropout ratio is 0.2. To report the final validation accuracy, it first sets aside 25K randomly selected images from the training set as a minival set and then does early stopping on this minival.
The following table lists the sizes of the different EfficientNet models in terms of number of parameters and FLOPs.
Model | # of Parameters | # of FLOPs |
EfficientNet-B0 | 5.3M | 0.39B |
EfficientNet-B1 | 7.8M | 0.70B |
EfficientNet-B2 | 9.2M | 1.0B |
EfficientNet-B3 | 12M | 1.8B |
EfficientNet-B4 | 19M | 4.2B |
EfficientNet-B5 | 30M | 9.9B |
EfficientNet-B6 | 43M | 19B |
EfficientNet-B7 | 66M | 37B |
Expected Input
The expected input of an EfficientNet model is a float tensor of pixels with values in the [0-255] range.
Expected Output
The expected output of an EfficientNet model depends on the task. For image classification, the expected output is a probability distribution over the classes.
To acquire the best results, Convolutional Neural Networks (ConvNets) are typically trained with a limited budget and then scaled up when more resources become available. Improving the accuracy of ConvNets involves scaling them up. The most typical method is to increase the ConvNet’s depth or width. Model scaling based on image resolution is another approach that is gaining traction but is still not widely used. It is typical practice in prior work to scale only one depth, breadth, or image size dimension. While it is possible to arbitrarily scale a dataset with two or three dimensions, doing so is laborious and often results in sub-optimal accuracy and efficiency. The developers of EfficientNet set out to fix this.
EfficientNet was created by utilizing a neural architecture search that optimized for both accuracy and FLOPS.
Several real-world applications using EfficientNet have already reached the hardware memory restriction; thus, improved efficiency is required for any additional gains in accuracy. For example:
EfficientNet was pretrained on the ImageNet dataset.
There are 14,197,122 images in the ImageNet dataset, all of which have been labeled using the WordNet hierarchy. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a standard in picture classification and object recognition, has been using this dataset as a training set since 2010.
Evaluation metrics assess the model’s effectiveness and how well it predicts the anticipated outcome. There are different evaluation metrics for different sets of machine learning algorithms. They help assess models’ performance, monitor machine learning systems in production, and control models to fit a given business need.
Since a model’s performance may look good in one measurement of an evaluation metric but bad in another, it is essential to employ a variety of evaluation metrics when assessing it.
Accuracy
A standard statistic for evaluating the effectiveness of a classification model is accuracy. It displays the proportion of samples in the assessment dataset that were properly categorized out of all the samples. Higher accuracy denotes better performance in correctly classifying the input data’s classes or categories.
The following table displays the top-1 and top-5 accuracy on ImageNet reported in the original paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” for each pretrained EfficientNet model.
Model | Top-1 Accuracy (%) | Top-5 Accuracy (%) |
EfficientNet-B0 | 77.1 | 93.3 |
EfficientNet-B1 | 79.1 | 94.4 |
EfficientNet-B2 | 80.1 | 94.9 |
EfficientNet-B3 | 80.6 | 95.7 |
EfficientNet-B4 | 82.9 | 96.4 |
EfficientNet-B5 | 83.6 | 96.7 |
EfficientNet-B6 | 84.0 | 96.8 |
EfficientNet-B7 | 84.3 | 97.0 |
The EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters.
Using the training recipe for EfficientNet-B0 available in SuperGradients, Deci’s open-source computer vision library, you can reach a higher top-1 accuracy score on ImageNet (77.62%).
When selecting an architecture, there are several things you should carefully consider:
Having clarity on these topics before you start training the model can save you a lot of time, effort, and money.
The graph compares the performance of EfficientNet models with other image classification models optimized for NVIDIA Jetson Xavier.
Performance metrics reported:
The evaluation was made on a 50,000 validation set from ImageNet; Batch Size = 1; Quantization: FP16.
Below, see how to easily load and fine-tune a production-ready, pre-trained EfficientNetB0 model that incorporates best practices and validated hyperparameters for achieving best-in-class accuracy.
Define your dataset path and where you want your checkpoints to be saved and you are good to go from your terminal.
First, ensure that the data is stored in dataset_params.dataset_dir or add “dataset_params.data_dir=<PATH-TO-DATASET>” at the end of the command below. You can find instructions here.
Next, move to the project root (where you will find the ReadMe and src folder)
Finally, run the command:
python -m super_gradients.train_from_recipe --config-name=imagenet_efficientnet
Try a pre-trained EfficientNetB0 model on your machine. Import SuperGradients, initialize your Trainer, and load your desired ResNet model and pre-trained weights.
# The pretrained_weights argument will load a pre-trained architecture on the provided dataset import super_gradients model = models.get("efficientnet_b0 ", pretrained_weights="imagenet")
Production-ready models means they are compatible with deployment tools such as TensorRT (NVIDIA) and OpenVINO (Intel) and can be easily taken into production.
To export to ONNX, use the following:
# Load model with pretrained weights from super_gradients.training import models from super_gradients.common.object_names import Models model = models.get(efficientnet_b0, pretrained_weights="imagenet") # Prepare model for conversion # Input size is in format of [Batch x Channels x Width x Height] where 640 is the standard COCO dataset dimensions model.eval() model.prep_model_for_conversion(input_size=[1, 3, 640, 640]) # Create dummy_input # Convert model to onnx torch.onnx.export(model, dummy_input, "efficientnet_b0.onnx")
For more code examples, recipes, and advanced training techniques such as transfer learning, knowledge distillation, and more, refer to SuperGradients on GitHub.
We’d love your feedback on the information presented in this card. Please also share any unexpected results.
For a short meeting with the SuperGradients team, use this link and choose your preferred time.
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")