YOLO-NAS Pose is a pose estimation model trained on the COCO2017 dataset.
Deci AI Team
November 7, 2023
YOLO-NAS Pose is a pose estimation model trained on the COCO2017 dataset. Emerging from Deci’s proprietary NAS (Neural Architecture Search) engine, AutoNAC, coupled with cutting-edge training methodologies, it offers a superior latency-accuracy balance compared to YOLOv8 Pose. Specifically, the medium-sized version, YOLO-NAS Pose M, outperforms the large YOLOv8 variant with a 38.85% reduction in latency on an Intel Xeon 4th gen CPU, all while achieving a 0.27 boost in [email protected] score.
In pose estimation, two primary methodologies have traditionally dominated: top-down methods and bottom-up methods. YOLO-NAS Pose follows neither. Instead, it executes two tasks simultaneously: detecting persons and estimating their poses in one swift pass. This unique capability sidesteps the two-stage process inherent to many top-down methods, making its operation akin to bottom-up approaches. Yet, differentiating it from typical bottom-up models like DEKR. YOLO-NAS Pose employs a streamlined postprocessing, leveraging class NMS for predicted person boxes. The culmination of these features delivers a rapid model, perfectly primed for deployment on TensorRT.
YOLO-NAS Pose’s architecture is based on the YOLO-NAS architecture used for object detection. Both architectures share a similar backbone and neck design, but what sets YOLO-NAS Pose apart is its innovative head design crafted for a multi-task objective: simultaneous single-class object detection (specifically, detecting a person) and the pose estimation of that person. AutoNAC wasa employed to find the optimal head design, ensuring powerful representation while adhering to predefined runtime constraints.
YOLO-NAS Pose offers four distinct size variants, each tailored for different computational needs and performances:
|Number of Parameters (In millions)||[email protected]||Latency (ms)
Intel Xeon gen 4th (OpenVino)
Jetson Xavier NX (TensorRT)
NVIDIA T4 GPU (TensorRT)
YOLO-NAS Pose takes an image or video as an input.
YOLO-NAS Pose outputs bounding boxes and confidence scores for detected persons and predicted coordinates (X,Y) for each keypoint of the skeleton and confidence score of each keypoint (indicating whether model is confident specific keypoint is visible).
The field of pose estimation is integral to computer vision, serving a spectrum of crucial applications. From healthcare’s need to monitor patient movements and the intricate analysis of athlete performances in sports, to creating seamless human-computer interfaces and enhancing robotic systems – the demands are vast. Not to mention, sectors like entertainment and security where swift and accurate posture detection is paramount.
Earlier this year, Deci introduced YOLO-NAS, a groundbreaking object detection foundation model that gained widespread recognition. Building on YOLO-NAS, the team unveiled its pose estimation sibling: YOLO-NAS Pose.
Some real-world applications of YOLO-NAS Pose include:
from transformers import AutoFeatureExtractor, AutoModelForImageClassification extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50") model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")