Vision Transformer


A deep learning model that converts a single input image into a sequence of image patches. Vision transformers are often used for image recognition and other image processing tasks, including object detection, image segmentation, cluster analysis, anomaly detection, and more.

Vision transformers enable efficient classification, strong modeling and scalability in a simple and straightforward way. When compared to convolutional neural networks, vision transformers are able to achieve better performance on large datasets.

Filter terms by

Glossary Alphabetical filter

Related resources

DataGradient Launch_Blog-v3
Computer Vision
Computer Vision
Computer Vision
Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")