In deep learning, production-aware optimization actively considers production constraints and requirements such as the latency, the size of the model, and the accuracy throughout the development process. By optimizing the design of neural networks for the target inference hardware and production environment, the success rate in production increases.