ResNet is an image classification model pre-trained on ImageNet-1k at Resolution 224×224 datasets.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, in the paper, “Deep Residual Learning for Image Recognition”
December 10, 2015
1.7M to 60.2M
The original ResNet model architecture had 152 layers. This was a significant increase from the previous SOTA models at the time, which typically used networks with fewer than 100 layers. The ResNet-152 model achieved record-breaking performance and set a new standard for image classification models.
The authors of the paper also trained shallower ResNet models with 34, 50 and 101 layers, named ResNet-34, ResNet-50 and ResNet-101, respectively. The idea behind training and comparing these different models was to understand how network depth affects the model’s performance. The authors found that the deeper networks (ResNet-152) achieved better accuracy than the shallower networks and that the performance continued to improve as the number of layers increased.
All pre-trained ResNet models expect input images normalized similarly, i.e., mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224.
The images have to be loaded into a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen during preprocessing.
The model outputs image scores for each of the 1000 classes of ImageNet.
Microsoft Vision Model ResNet is a large pre-trained computer vision model created by the Multimedia Group at Microsoft Bing. ResNet is based on a residual learning framework and uses skip connections to help propagate gradients through the layers. It is trained on the ImageNet dataset and is used for image classification and as a go-to backbone for object detection and other tasks.
ResNet has shown superior performance compared to other models released before it in terms of accuracy.
Some examples of real-world applications of ResNet50 include:
from transformers import AutoFeatureExtractor, AutoModelForImageClassification extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50") model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")