Measuring the inference time of a trained deep neural model on different hardware devices is a critical task when making deployment decisions. Should you deploy your inference on 8 Nvidia V100s, on 12 P100s, or perhaps you can use 64 CPU cores?
When it comes to inference timing, apple-to-apple comparisons among devices do not require rocket science. Nevertheless, the process is a true time-consuming burden that is prone to errors and requires expertise to perform correctly.
Fortunately, Deci AI released a free service that does it for you. The Deci Inference Performance Simulator (DiPS) can help practitioners analyze their inference performance. DiPS can measure model throughput, latency, cloud cost, model memory usage, and other important performance metrics. It provides a full analysis on how your model will behave and perform across various production environments–at no cost.
Why is measuring run-time performance painful?
In how to measure deep learning performance, we provide practical guidelines for inference evaluation that include the following steps: (1) Write a latency measurement script (2) Write a script to compute the optimal batch size for inference (3) Write a throughput measurement script (4) Launch several machines on the cloud to run the all these scripts and (5) Summarize the obtained metrics and analyze the results.
Performing these steps is not only time consuming, it is also highly error prone. For example, issues may arise when it comes to timing on the CPU, measuring the transfer of data to and from the acceleration device, measuring preprocessing, and so on.
The DiPS platform deals with all the above details, and more, making it possible for you to obtain accurate inference timing. At Deci AI, our business is about accelerating inference and we created DiPS for our own internal use. When we saw that even practitioners face challenges with timing inference principles, we realized that everyone would benefit if we released DiPS to the community. We firmly believe that helping others tackle this technical challenge will go a long way towards promoting unified timing calculations.
The DiPS Report
The DiPS service platform receives as input a neural model and returns a comprehensive report on the model’s inference performance.
The model can be completely untrained, because DiPS is only concerned with timing and costs.
In the next section, we describe how to input your own model. But first, let’s see what makes this tool so attractive. Below you can see the Results Summary taken from a typical DiPS report.
The model that gave rise to this report is Yolo v3, implemented in ONNX. (DiPS also supports PyTorch and TensorFlow.) As you can see, the report includes 5 categories and a list of key insights. For example, one conclusion is that using Tesla-V100 will yield the highest throughput and lowest latency. Another non-trivial conclusion is that T-4 will yield the best price for the inference of 100K images. Other insights note the capacity of each model on the different hardware (optimal batch size), the tradeoff between cost and performance for each hardware, memory usage, and much more.
Even experienced programmers might need several days of code writing to produce this kind of study and a similar report encompassing all these hardware devices. With DiPS, it will take you at most a few minutes!
DiPS also offers a deeper look into each of the sections of the report. For example, anyone interested in computation costs can look at the report page that specifies the cost aspects for each of the hardware. For the scenario above, the model cloud cost section of the report looks like this:
Using the information provided, you can optimize your cloud costs depending on the desired input batch sizes–and even compare the cost of several models on a specific hardware.
How to Use DiPS?
You can watch this 2 minutes tutorial video to understand the whole process, or you can go through this quick walk-through on how to use the DiPS. The simulator can be found on Deci’s website. After inserting some initial details (Step 1) you will land on the following page (Step 2):
This page allows you to provide the minimal details needed for us to analyze your model. Fill in the following basic information:
- Model name – The name of the model you would like to analyze (any string is OK).
- Model framework – Choose one of the given frameworks. The minimal requirement for testing each framework is written in blue.
- Input dimension – The dimension of the tensor that should be used for the network. For example, if you work on ImageNet and PyTorch this will be (3,224,224).
- Inference hardware – The hardware you wish to test. You can choose up to 4 hardware types: Intel CPU, Nvidia V100, Nvidia T4, Nvidia K80.
- Choose how you want to give us access to the model.
- Checkpoint link – Share the model via a public link. When you select the framework, you’ll find specific instructions in blue under the framework field.
- Be contacted by Deci – Deci’s expert will contact you to get the model.
- Use an existing off-the-shelf model – You have the option of choosing one of several off-the-shelf models (e.g., ResNet 18/50, EfficientNet, MobileNet, and Yolo).
As mentioned above, you don’t need to supply a trained model in order to use DiPS. An untrained model will give rise to the same inference timing (and cost) metrics.
Why you can relax when it comes to privacy
It’s natural that most users will be concerned about sharing models, weights, or data. For this reason, we built DiPS as a fully secure and private application, where all the data and model weights remain confidential. We also allow you to choose an off-the-shelf model from our model repository, so we use our own existing models for analysis. After analyzing the model, we immediately delete your model from our servers. We never save a copy of your model. Moreover, DiPS uses a secure transfer protocol with the highest encryption standards available. At Deci, we are committed to ensuring that no one will use or distribute any of the input models. If you still have privacy concerns, you can upload an open-source model that has the same characteristics, or alter your own model.
Save time and prevent errors in measuring your model performance
DiPS is a new tool, available free of charge, for measuring the inference performance of deep learning architectures on different hardware platforms. It provides a unified approach to evaluating your model’s metrics with the simple click of a button. DiPS is openly available to the deep learning community to help save time and prevent errors in latency/throughput measurements.
Deci is committed to keeping any models evaluated using DiPS completely secure and private. So all that remains is for you to try DiPS from the following link and tell us what you think. Why not try it right now?