Model Zoo

Start training with SOTA computer vision models that span across various tasks including image classification, semantic segmentation, and object detection.

YOLO-NAS Pose offers a superior latency-accuracy balance compared to YOLOv8 Pose. Specifically, the medium-sized version, YOLO-NAS Pose M, outperforms the large YOLOv8 variant with a 38.85% reduction in latency on an Intel Xeon 4th gen CPU, all while achieving a 0.27 boost in [email protected] score.

DeciDiffusion 1.0 is an 820 million parameter text-to-image latent diffusion model trained on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset.

DeciLM 6B is a 5.7 billion parameter decoder-only text generation model. It outpaces pretrained models in its class, with a throughput that's up to 15 times that of Llama 2 7B's.

DeciCoder 1B is a 1 billion parameter decoder-only code completion model trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset.

YOLO-NAS is a groundbreaking object detection foundational model pre-trained on prominent datasets such as COCO, Objects365, and evaluated on COCO and RF100 dataset.


Dive into Google’s T5, a powerful Text-to-Text Transformer model. Understand its capabilities, applications, and how to use it efficiently.

Discover Easy-to-Use Training

Simplify deep learning development with SuperGradients, an open-source, production-ready library for training PyTorch-based computer vision models.

DEKR is a pose estimation model pretrained on COCO 2017 and THE Crowd Pose dataset. It was introduced on April 06, 2021, in the paper titled, “Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression”, by Zigang Geng, Ke Sun, Bin Xiao , Zhaoxiang Zhang , Jingdong Wang.

YOLOX is an object detection model that was introduced on August 06, 2021, in the paper titled, “YOLOX - Exceeding YOLO Series in 2021."


Vision Transformers is a novel approach to image classification tasks that capture long-range dependencies between patches in an image.

EfficientNet is a convolutional neural network (CNN) architecture pre-trained on CIFAR-10 and CIFAR-100, Birdsnap, Stanford Cars, Flowers, FGVC Aircraft, Oxford-IIIT Pets, and Food-101 datasets.

ResNet is an image classification model pre-trained on ImageNet-1k at Resolution 224×224 datasets.

PP-LiteSeg is a lightweight real-time semantic segmentation model that uses a modified encoder-decoder architecture that incorporates three similarly novel modules: Flexible and Lightweight Decoder (FLD), Unified Attention Fusion Module (UAFM), and Simple Pyramid Pooling Module (SPPM).

Discover Additional Helpful Resources

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")