A customer developing an AI platform for text summarization was struggling to achieve satisfactory latency performance on a model powering their application. This led to a poor user experience as well as high cloud costs. The model was deployed on NVIDIA T4 GPU.
The Solution
The customer used Deci’s compilation and quantization tools to easily optimize the model performance and significantly reduce cloud cost as well as improve the user experience.
The Results
0%
Cloud Cost Reduction
0X
Latency Acceleration
0%
Model Size Reduction
Use Deci’s Development Platform to:
Enable Real-Time Inference at the Edge
Improve latency and throughput and reduce model size by up to 5X while maintaining the model’s accuracy.
Process More Video Streams on Less Devices
Maximize hardware utilization and cost-efficiently scale your solution at the edge.
Deploy Your Models on Any Edge Device
Eliminate inference cloud compute cost and avoid data privacy issues by running your models directly on edge devices.
Talk to Our Experts
Tell us about your use case, needs, goals, and the obstacles in your way. We’ll show you how you can use the Deci platform to overcome them.