CASE STUDY

2.1X Acceleration Leads to Cloud Costs Savings and Improved User Experience for BRIA’s Gen AI Platform

0 %
Lower Cost of Serving
0 X
Lower Latency
(Stable Diffusion)
0 X
Lower Latency
(Segment Anything)

Customer

Gen AI company

Industry

Computer software

Use case

Visual generative AI

Introduction

The demand for high-quality visual content continues to grow across industries. Visual Generative AI foundation models such as Latent or cascaded diffusion models, SAM and others, provide the ability to generate or customize realistic images. This is opening up new possibilities for many new applications. These foundation models have the potential to streamline and automate tasks that would otherwise require significant time and resources to complete manually.

The Challenge

BRIA provides engineers, AI teams, and researchers with safe and legal Visual Generative AI capabilities. With an access to trained models, source code, and comprehensive API suites, companies can enhance their products and services using BRIA’s Visual Generative AI Platform. BRIA’s solutions, having been trained on the world’s largest, fully licensed, high-quality training set, eliminate all legal risks for commercial enterprise use.

Cloud cost reduction is also a top priority for BRIA. Foundation models are larger and their inference process is more complex compared to classical AI models. To generate a new sample, a model performs several inference iterations. The combination of extremely large models and the iterative nature of the inference results in a significantly higher demand for compute power and overall inference costs. In order to run inference at a high scale, BRIA aims to reduce its cloud expenditure by optimizing inference time and increasing GPU utilization. Moreover, faster models are not only more cost-effective, but they also increase client satisfaction by reducing the latency of inference.

The Solution

Using the Deci’s platform, BRIA’s team was able to optimize the inference performance of their Stable Diffusion v1.5 and Segment Anything models and reduce their inference cloud cost by 50%.

BRIA’s team used Deci’s Infery library to easily perform a hybrid compilation and selective quantization in their complex diffusers and transformers based architectures.

Infery automatically profiles the architecture’s sub-components and layers and then leverages the optimal production orientated framework and quantization level for each one, all while taking into account the inference hardware characteristics. With Infery, BRIA’s team was able to maximize the acceleration potential of their complex models while saving valuable time and effort.

Infery’s optimization module was easily integrated into BRIA’s CI/CD pipeline. In its production environment, BRIA’s uses Infery’s deployment module, which includes advanced inference capabilities, integrated as a backend inference engine with NVIDIA Triton server.

The Results

- %
Lower Cost of Serving
0 X
Lower Latency (Stable Diffusion)
0 X
Lower Latency (Segment Anything)

With Deci's Foundation Models and Developer Tools,
You Can:

Launch Your Gen AI
Apps Faster

Use enterprise-grade models. Lower risk, shorten dev time from months to days.

Scale Your Gen AI Inference Cost-Effectively

Save up to 80% on your inference cost. Migrate workloads to affordable & widely available HW.

Improve User Experience with Better Inference Speed

Ship better products and delight users with low latency performance.

How It Works:

Choose a Pre-trained Foundation Model

Select a highly efficient LLM or text-to-image model generated with Deci’s NAS based AutoNAC engine.

Fine-tune For Your Data on Premise

Train or fine-tune on premise with Deci’s library or your library of choice.



Optimize & Run Self-Hosted Inference

Automatically apply advanced acceleration techniques to any model and run on any environment.

Talk to Our Experts

Build Better Models Faster with Deci’s Deep Learning Development Platform

Tell us about your use case, needs, goals, and the obstacles in your way. We’ll show you how you can use the Deci platform to overcome them.

Book a Demo

Add Your Heading Text Here
				
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")