Description
ResNet is an image classification model pre-trained on ImageNet-1k at Resolution 224×224 datasets. 

Publishers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, in the paper, “Deep Residual Learning for Image Recognition

Submitted Version
December 10, 2015

Latest Version

N/A

Size

1.7M to 60.2M

Image Classification

Overview


Model Highlights

  • Task: Image classification
  • Model type: Convolutional Neural Network
  • Framework: PyTorch
  • Dataset: ImageNet-1k at Resolution 224×224

Model Size and Parameters

The original ResNet model architecture had 152 layers. This was a significant increase from the previous SOTA models at the time, which typically used networks with fewer than 100 layers. The ResNet-152 model achieved record-breaking performance and set a new standard for image classification models.

The authors of the paper also trained shallower ResNet models with 34, 50 and 101 layers, named ResNet-34, ResNet-50 and ResNet-101, respectively. The idea behind training and comparing these different models was to understand how network depth affects the model’s performance. The authors found that the deeper networks (ResNet-152) achieved better accuracy than the shallower networks and that the performance continued to improve as the number of layers increased.

Expected Input

All pre-trained ResNet models expect input images normalized similarly, i.e., mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224.

The images have to be loaded into a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen during preprocessing.

Expected Output

The model outputs image scores for each of the 1000 classes of ImageNet.

History and Applications

Microsoft Vision Model ResNet is a large pre-trained computer vision model created by the Multimedia Group at Microsoft Bing. ResNet is based on a residual learning framework and uses skip connections to help propagate gradients through the layers. It is trained on the ImageNet dataset and is used for image classification and as a go-to backbone for object detection and other tasks.

ResNet has shown superior performance compared to other models released before it in terms of accuracy.

Some examples of real-world applications of ResNet50 include:

  • Cancer diagnostic X-ray technology
  • Recognition of human emotions
  • Recognition of dog breeds
  • Recognition of plant diseases
  • Classification of types of skin cancer
Share
Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")