How to Improve Model Efficiency with Hardware-Aware Neural Architecture Search

Resource Featured Image

In this talk, Yonatan Geifman, Deci’s CEO and Co-founder, covers the evolution of NAS technology and recent advances that are making NAS viable for industry applications and commercial usage. He outlines the algorithmic optimization process with case studies and best practices for achieving best-in-class accuracy and latency results on a variety of hardware devices.

Access the recorded session for a comprehensive discussion of hardware-aware neural architecture search as a solution to inference inefficiency.

✅ Learn how NAS-powered AutoNAC generates and optimizes models according to specific use case requirements.

✅ Discover NAS-generated models that deliver state-of-the-art performance on specific hardware

The Importance of Model Architecture for Inference Efficiency

Running successful and efficient inference at scale requires meeting various performance criteria, including accuracy, model size, and latency and throughput for your target hardware and inference environment.

Meeting all of these targets at once is no mean task. In fact, often the attempt to improve accuracy results in larger and more complex models with higher latency.

Multiple factors other than the model architecture affect inference efficiency. These include the inference hardware, the hardware drivers and graph compilers, and the model compressions techniques (quantization and pruning) used. However, the greatest potential for improving inference efficiency lies with model architecture. Selecting a model architecture that’s optimal for your specific inference requirements is key to improving inference efficiency.

Hardware-Aware Neural Architecture Search

Neural Architecture Search (NAS) holds the power to automate the cumbersome deep learning model development process, as well as quickly and efficiently generate deep neural networks that are designed to meet specific production constraints. Deci’s AutoNAC (Automated Neural Architecture Construction) technology does this by finding the best algorithm that takes into account all of the many parameters that are required to create powerful and efficient deep learning models for real-world applications.

AutoNAC has been successful in generating new optimized architectures for a wide range of computer vision tasks and hardware, including object detection, semantic segmentation, pose estimation, and image classification. It was used to generate YOLO-NAS, the state-of-the-art object detection model on NVIDIA T4 GPUs, YOLO-NAS Pose, the groundbreaking pose estimation model on NVIDIA Jetson Xavier and Intel Xeon CPUs, the semantic segmentation model, DeciSegs, which offers the optimal balance of speed and accuracy on NVIDIA Jetson Orin, and YOLO-NAS-Sat, the fastest and most accurate model for small object detection on NVIDIA Jetson Orin.

To delve into this in-depth discussion, simply sign up above to gain access to the recorded session.

Seeking to improve deep learning model inference for your specific use case? Book a demo here.

Access Session Now

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")