3.92X acceleration leads to 68% cloud cost savings & better UX for a generative text summarization app

0 %
Cloud Cost Reduction
0 X
Latency Acceleration
0 %
Model Size Reduction


AI platform company


Computer software

Use case

NLP (Text summarization on NVIDIA GPU)

The Challenge

A customer developing an AI platform for text summarization was struggling to achieve satisfactory latency performance on a model powering their application. This led to a poor user experience as well as high cloud costs. The model was deployed on NVIDIA T4 GPU.

The Solution

The customer used Deci’s compilation and quantization tools to easily optimize the model performance and significantly reduce cloud cost as well as improve the user experience.

The Results

0 %
Cloud Cost Reduction
0 X
Latency Acceleration
0 %
Model Size Reduction

Use Deci’s Development Platform to:

Cut Cloud ML Spend with
Faster Inference

Lower your cloud bill by maximizing the throughput of your models.

Migrate Workloads to More Affordable Hardware

Run your models on affordable and widely available GPUs by improving inference efficiency.

Improve User Experience with Better Inference Speed

Improve inference speed without compromising on accuracy.

Talk to Our Experts

Tell us about your use case, needs, goals, and the obstacles in your way. We’ll show you how you can use the Deci platform to overcome them.

Book a Demo

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")