Optimize your generative models to reduce inference cost and deliver better user experience. Easily optimize your models to gain cost-efficient inference without compromising accuracy with Deci’s inference acceleration tools.
Extremely large models and variable inference costs means that your generative AI applications come at a significantly high operational cost. As your inference scales, so does your cloud bill.
Inference Acceleration on Average
Model Size Reduction on Average
Cloud Cost Reduction
A company offering a video generation application wasn’t achieving the desired throughput to support its offering. The application is powered by a GAN model for image generation.
The customer leveraged Deci’s compilation and quantization tool to optimize the latency and achieve a 2.1x boost. As a result, the customer was able to process its videos on less machines and hence reduce cloud cost by 40%.
The boost in performance enabled the customer to deliver better use experience as well as reduce the cloud compute costs.
A company developing an AI platform for text summarization was not achieving satisfactory latency on a model powering their application. This led to a poor user experience as well as high cloud costs.
The team used Deci’s compilation and quantization tools to easily optimize their model performance, enabling the customer to significantly reduce cloud cost as well as improve the user experience. Solution delivered: Automated compilation and quantization to FP16.
Lior Hakim, Co-Founder & CTO
Hour One
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")