Deploy and Serve
Your Deep Learning Models
in an Optimized Container-based Server

Deci’s Runtime Inference Container (RTiC) is a containerized deep-learning runtime engine that enables easy deployment of models as microservices and maximizes GPU/CPU utilization for top performance.

SIGN UP FOR FREE

The Challenge

In today’s versatile cloud environments with so many types of hardware, frameworks, and model types, DevOps and data scientists constantly struggle to tune and deliver AI models in a microservice production environment. One of the main obstacles, inhibiting the effective use of ML models, is the challenge of serving a model for inference within its target cloud environments. While container technology has transformed the face of cloud-based IT operation, dedicated containerization of AI inference tasks has been left behind. Simply placing a model on a general-purpose server, or even inference dedicated servers, results in inefficient inference performance and unnecessary challenges for continuous optimization and tuning. Similar to general application containers, using containers to perform deep learning inference should allow faster deployment and portability of AI-models, improved developer productivity, the agility to scale on-demand, and more efficient utilization of compute resources.

Deci’s Runtime Inference Container (RTiC)

RTiC, as a standard docker container, has its own file system plus dedicated inference server software and packages, all bundled together within the container. RTiC maximizes the utilization of the underlying hardware while enabling the inference of multiple models on the same hardware. You get to leverage best-of-breed open source optimization compilers, such as TensorRT and OpenVino. With RTiC, you can use standard container orchestration applications such as Kubernetes to deploy, manage, and scale microservices up or down.

Benefits:

  • Profit from a highly efficient inference server that works on most common cloud environments, including GCP, AWS, and Azure, orchestrated using standard tools including Kubernetes, EKS, AKS, and GKS
  • RTiC optimizes performance and resource utilization by leveraging best-of-breed graph compilers such as TensorRT and OpenVino, integrated into the container for easy use
  • Standard API communication to/ from the models, including functions for inference requests, loading models (hot swap), and measuring or monitoring model performance in production

Seamless Cloud Deployment with RTiC™

A containerized deep learning runtime engine that easily turns any model into a blazing fast server.

Applications

  • Medical AI Diagnoses
  • Video Analytics
  • Security Cameras
  • Manufacturing
  • Image Editing
  • Request a Demo >

Deployment Options

Data Center

CPU and GPU

Cloud Deci

Cloud

CPU and GPU

Edge Server

Edge Server

CPU and GPU

Data Center

CPU and GPU

Cloud Deci

Cloud

CPU and GPU

Edge Server

Edge Server

CPU and GPU

“Using Deci’s platform we achieved a 2.6x increase in inference throughput of one of our heavy multiclass classification models running on V100 machines - without losing accuracy. Deci can cut 50% off the deep learning inference compute costs in our cloud deployments worldwide. We are very impressed by Deci's technology!”

Chaim Linhart, CTO and Co-Founder, Ibex Medical Analytics

Relevant Resources

Blog

Deci RTiC – The Case for Containerization of AI Inference

Read Blog Post
How Deci and Intel Hit 11.8x Inference Acceleration at MLPerf

Blog

How Deci and Intel Hit 11.8x Inference Acceleration at MLPerf

Read Blog Post

Blog

The Correct Way to Measure Inference Time of Deep Neural Networks

Read Blog Post

Optimize Your Deep Learning Models for FREE