Description
ResNet is an image classification model pre-trained on ImageNet-1k at Resolution 224×224 datasets.
Publishers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, in the paper, “Deep Residual Learning for Image Recognition”
Submitted Version
December 10, 2015
Latest Version
N/A
Size
1.7M to 60.2M
The original ResNet model architecture had 152 layers. This was a significant increase from the previous SOTA models at the time, which typically used networks with fewer than 100 layers. The ResNet-152 model achieved record-breaking performance and set a new standard for image classification models.
The authors of the paper also trained shallower ResNet models with 34, 50 and 101 layers, named ResNet-34, ResNet-50 and ResNet-101, respectively. The idea behind training and comparing these different models was to understand how network depth affects the model’s performance. The authors found that the deeper networks (ResNet-152) achieved better accuracy than the shallower networks and that the performance continued to improve as the number of layers increased.
Expected Input
All pre-trained ResNet models expect input images normalized similarly, i.e., mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224.
The images have to be loaded into a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen during preprocessing.
Expected Output
The model outputs image scores for each of the 1000 classes of ImageNet.
Microsoft Vision Model ResNet is a large pre-trained computer vision model created by the Multimedia Group at Microsoft Bing. ResNet is based on a residual learning framework and uses skip connections to help propagate gradients through the layers. It is trained on the ImageNet dataset and is used for image classification and as a go-to backbone for object detection and other tasks.
ResNet has shown superior performance compared to other models released before it in terms of accuracy.
Some examples of real-world applications of ResNet50 include:
The models were originally trained on the ImageNet-1K dataset, which consists of 1000 classes.
The images in ImageNet-1K were obtained through crowdsourcing on Amazon’s Mechanical Turk platform, Flickr, and other search engines. They were labeled with either the presence or absence of 1000 object categories. The 1000 object categories contain both internal nodes and leaf nodes of ImageNet but do not overlap with each other.
The models were trained on the 1.28 million training images in ImageNet-1K and evaluated on the 50K validation images. The models were then tested on the 100K test images in the dataset.
Evaluation Metrics
Evaluation metrics are used to measure the quality of the model. When you build your model, it is crucial to measure how accurately it predicts your expected outcome. We have different evaluation metrics for different sets of machine learning algorithms. For evaluating classification models, we use classification metrics.
Evaluation metrics can help you assess your model’s performance, monitor your ML system in production, and control your model to fit your business need. It is crucial to use multiple evaluation metrics to evaluate your model, as a model may perform well using one measurement from one evaluation metric but may perform poorly using another measurement from another evaluation metric.
Accuracy
The simplest metric for model evaluation is accuracy. It is the ratio of the number of correct predictions to the total number of predictions made for a dataset.
The following table displays the top-1 and top-5 accuracy reported in the original paper “Deep Residual Learning for Image Recognition” for each ResNet model:
Model | Top-1 Accuracy (%) | Top-5 Accuracy (%) |
ResNet-34 B | 78.16 | 94.29 |
ResNet-34 C | 78.47 | 94.4 |
ResNet-50 | 79.26 | 94.75 |
ResNet 101 | 80.13 | 98.4 |
ResNet 152 | 80.62 | 95.51 |
The well-known paper “Resnet Strikes Back: An improved training procedure in timm,” improved on these results. For instance, the top-1 accuracy reported for ResNet-50 was 80.4%.
Using the training recipe for ResNet-50 available in SuperGradients, Deci’s open-source computer vision library, you can reach a top-1 accuracy score of 81.9%.
When selecting an architecture, there are several things you should carefully consider:
Having clarity on these topics before you start training the model can save you a lot of time, effort, and money.
The graph compares the performance of ResNet models with other image classification models optimized for NVIDIA Jetson Xavier.
Performance metrics reported:
The evaluation was made on a 50,000 validation set from ImageNet; Batch Size = 1; Quantization: FP16.
You can use ResNet models for image classification. Below, see how to easily load and fine-tune a production-ready, pre-trained ResNet model that incorporates best practices and validated hyperparameters for achieving best-in-class accuracy.
For the sake of this example, we’ll use ResNet50, but with SuperGradients, Deci’s open source, all-in-one computer vision training library, you can also access other ResNet models, including ResNet101, ResNet34 and ResNet18.
Define your dataset path and where you want your checkpoints to be saved and you are good to go from your terminal.
First, ensure that the data is stored in dataset_params.dataset_dir or add “dataset_params.data_dir=<PATH-TO-DATASET>” at the end of the command below. You can find instructions here.
Next, move to the project root (where you will find the ReadMe and src folder)
Finally, run the command:
python -m super_gradients.train_from_kd_recipe --config-name=imagenet_resnet50_kd
Try a pre-trained ResNet model on your machine. Import SuperGradients, initialize your Trainer, and load your desired ResNet model and pre-trained weights.
# The pretrained_weights argument will load a pre-trained architecture on the provided dataset import super_gradients model = models.get("resent50", pretrained_weights="imagenet")
Production-ready models means they are compatible with deployment tools such as TensorRT (NVIDIA) and OpenVINO (Intel) and can be easily taken into production.
To export to ONNX, use the following:
# Load model with pretrained weights from super_gradients.training import models from super_gradients.common.object_names import Models model = models.get(Models.resnet50, pretrained_weights="imagenet") # Prepare model for conversion # Input size is in format of [Batch x Channels x Width x Height] where 640 is the standard COCO dataset dimensions model.eval() model.prep_model_for_conversion(input_size=[1, 3, 640, 640]) # Create dummy_input # Convert model to onnx torch.onnx.export(model, dummy_input, "resnet50.onnx")
For more code examples, recipes, and advanced training techniques such as transfer learning, knowledge distillation, and more, refer to SuperGradients on GitHub.
Apache-2.0
We’d love your feedback on the information presented in this card. Please also share any unexpected results.
For a short meeting with the SuperGradients team, use this link and choose your preferred time.
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")