3.92X acceleration leads to 68% cloud cost savings & better UX for a generative text summarization app

AI platform company


Computer software

Use case

NLP (Text summarization on NVIDIA GPU)

The Challenge

A customer developing an AI platform for text summarization was struggling to achieve satisfactory latency performance on a model powering their application. This led to a poor user experience as well as high cloud costs. The model was deployed on NVIDIA T4 GPU.

The Solution

The customer used Deci’s compilation and quantization tools to easily optimize the model performance and significantly reduce cloud cost as well as improve the user experience.

The Results

