Description
YOLO-NAS-Sat is a small object detection model, pre-trained on COCO, fine-tuned on DOTA 2.0.
Publishers
Deci AI Team
Submitted Version
February 22, 2024
Latest Version
N/A
Size
N/A
YOLO-NAS-Sat is a small object detection model.
Building on the solid foundation of YOLO-NAS, renowned for its standard object detection, YOLO-NAS-Sat tackles the specific challenge of pinpointing small objects. While retaining the YOLO-NAS core, we’ve implemented key changes to sharpen its focus on small objects:
These architectural innovations ensure that YOLO-NAS-Sat is uniquely equipped to handle the intricacies of small object detection, offering an unparalleled accuracy-speed trade-off.
YOLO-NAS-Sat offers four distinct size variants, each tailored for different computational needs and performances:
Number of Parameters (In millions) | Latency AGX Orin (Excluding IO) | Latency NX Orin (Excluding IO) | ||
YOLO-NAS-Sat-S | 15.2M | 56.4 | 4.48 | 15.99 |
YOLO-NAS-Sat-M | 17.7M | 58.21 | 5.7 | 21.01 |
YOLO-NAS-Sat-L | 39.8M | 62.14 | 10.08 | 38.40 |
YOLO-NAS-Sat-X | 40.3M | 63.38 | 14.3 | 49.34 |
Expected Input
The expected input of the YOLO-NAS-Sat model is an RGB image of fixed size. The image is usually preprocessed by resizing it to the desired size and normalizing its pixel values to be between 0 and 1.
Expected Output
The expected output of the YOLO-NAS-Sat model is bounding boxes and confidence scores for detected objects.
While YOLO-NAS-Sat excels in satellite imagery analysis, its specialized architecture is also ideally tailored for a wide range of applications involving other types of images:
YOLO-NAS-Sat was trained from scratch on the COCO dataset, followed by fine-tuning on the DOTA 2.0 dataset. The DOTA 2.0 dataset is an extensive collection of aerial images designed for object detection and analysis, featuring diverse objects across multiple categories. For fine-tuning, the input scenes were segmented into 1024×1024 tiles, using a 512px step for comprehensive coverage. Additionally, each scene was scaled to 75% and 50% of its original size to enhance detection robustness across various scales.
During the training process, random 640×640 crops were extracted from these tiles to introduce variability and enhance model resilience. For the validation phase, the input scenes were divided into uniform 1024×1024 tiles.
Mean Average Precision is a metric used to evaluate object detection models such as Fast R-CNN, YOLO, Mask R-CNN, etc. The mean of average precision (AP) value is calculated over recall values from 0 to 1. A higher mean average precision indicates better accuracy.
Mean Average Precision (mAP) is based on the following sub-metrics.
To benchmark YOLO-NAS-Sat against YOLOv8, we subjected YOLOv8 to the same fine-tuning process previously outlined for YOLO-NAS-Sat.
Name | |
YOLOv8 N | 47.32 |
YOLOv8 S | 51.61 |
YOLOv8 M | 53.85 |
YOLOv8 L | 55.15 |
YOLO-NAS-Sat-S | 56.4 |
YOLO-NAS-Sat-M | 58.21 |
YOLO-NAS-Sat-L | 62.14 |
YOLO-NAS-Sat-XL | 63.38 |
While higher accuracy typically demands a trade-off in speed, the finely tuned YOLO-NAS-Sat models break this convention by achieving lower latency compared to their YOLOv8 counterparts. However, on NVIDIA AGX Orin at TRT FP16 Precision, each YOLO-NAS-Sat variant outpaces the corresponding YOLOv8 variant.
As can be seen in the above graph, YOLO-NAS-Sat S outpaces its YOLOv8 counterpart by 1.52 times, YOLO-NAS-Sat M by 2.38 times, and YOLO-NAS-Sat L by 2.02 times.
YOLO-NAS-Sat supports the same .predict () functionality as regular YOLO-NAS. However, we do not open-source model weights, therefore one needs to obtain weights from Deci’s platform in order to use it.
The YOLO-NAS Sat model is available under commercial license. To learn more, request a free trial.
We’d love your feedback on the information presented in this card. Please also share any unexpected results.
For a short meeting with the SuperGradients team, use this link and choose your preferred time.
Deci is ISO 27001
Certified
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")