As AI applications power a growing number of use cases and industries, they are setting new and demanding requirements for inference performance. However, running successful inference at scale requires meeting various criteria such as accuracy, latency, throughput, and model size, among others.
In this talk, we’ll cover the various approaches to computer vision model design, the common mistakes made, and how these impact inference performance. We’ll outline a new algorithmic optimization approach that is based on Neural Architecture Search (NAS) technology. You will gain insight into how NAS can be leveraged to build production-grade models, accelerate time to market, and reduce inference compute costs.