How to Convert a Model from PyTorch to TensorRT and Deploy in 10 Minutes

an image representing neural connections

This post explains how to convert a model from PyTorch to TensorRT in just 10 minutes. It’s simple and you don’t need any prior knowledge.

Why Should You Convert from PyTorch to TensorRT?

TensorRT is a machine learning framework for NVIDIA’s GPUs. It is built on CUDA, NVIDIA’s parallel programming model. When applied, it can deliver around 4 to 5 times faster inference than the baseline model.

In this tutorial, converting a model from PyTorch to TensorRT involves the following general steps:

1. Build a PyTorch model by doing any of the two options:

  • Train a model in PyTorch
  • Get a pre-trained model from the PyTorch ModelZoo, other model repository, or directly from Deci’s SuperGradients, an open-source PyTorch-based deep learning training library.

2. Convert the PyTorch model to ONNX.

3. Convert from ONNX to TensorRT.

Steps 1 and 2 are general and can be accomplished with relative ease. When we get to Step 3, we’ll show you how to get through it easily using the Deci platform. 

Conversion the Fast Way Using the Deci Platform 

Deci developed an end-to-end platform that enables AI developers to build, optimize, and deploy blazing-fast deep learning models on any hardware. The Deci platform offers faster performance, better accuracy, shorter development times, powerful optimization features, a visual dashboard for benchmarking and comparing models, and easy deployment.

Using the Deci Platform for Fast Conversion to TensorRT

We’ll start by converting our PyTorch model to ONNX model. This can be done in minutes using less than 10 lines of code.

Once you have the ONNX model ready, our next step is to save the model to the Deci platform, for example “resnet50_dynamic.onnx”.

Now it’s time to upload the model to the Deci platform.

Sign in to the platform, or sign up if you haven’t yet done that. Once you log in, go to the lab section and click “New Model”.

Deci Platform - New Model

In the form displayed, fill in the model name, description, type of task (e.g., in our case it is a classification task), hardware on which the model is to be optimized, inference batch_size, framework (ONNX), and input dimension for the model. Finally, give the path to the model and click “Done” to upload the model.

Deci Platform - Connect new model

The model is now uploaded onto the platform.

Uploaded model to the Deci platform

Once the model is uploaded, you can optimize it by selecting the model from the list and clicking “Optimize”. You should see a pop-up like the one shown here.

Optimize Model - Deci Platform
Start Optimization - Deci Platform - convert model from pytorch to tensorrt

Make sure the correct model name is selected from the dropdown, choose the target hardware and batch_size, and click “Next”.

We’ll set the quantization level as 16 bit and click “Start Optimization”.

A progress bar indicates that it should take just a few minutes to optimize for the target hardware.

Conversion in progress - Pytorch to Tensorrt

A new model appears in the list with a TRT8 tag, indicating that it is optimized for the latest TensorRT version – 8.

conversion completed - pytorch to tensorrt

One excellent feature of the Deci platform is the option to compare both models using different metrics, such as latency, throughput, memory consumption, or model size.

The Deci platform also makes it easy to compare performance to the original baseline model. We can compare multiple versions of the same model using any of the available metrics.

Comparing model versions with the Deci Platform

The table below summarizes the optimization results and proves that the optimized TensorRT model is better at inference in every way.

Optimization results - PyTorch to TensorRT

Now that the conversion and optimization are completed you can easily deploy the model by leveraging additional capabilities that are available on the Deci platform.

To deploy the model simply click “Deploy” at the top right corner.

Deploy the model after conversion from PyTorch to Tensorrt

There are two deployment options:

  1. InferyInfery is Deci’s proprietary deep-learning run-time inference engine, which can turn a model into an efficient runtime server and enable you to run a model from a Python package.
  2. RTiC: Runtime Inference Container (RTiC) is Deci’s proprietary containerized deep-learning run-time inference engine, which turns a model into an efficient run-time server. It enables efficient inference and seamless deployment, at scale, on any hardware. 

In this blog we will explore Infery inference engine to test our model.

Why Use Infery

  1. Infery is framework Agnostic. It provides one interface for all deep learning frameworks. Once you implement the inference logic then you are free to change the model ‘backend’ or ‘real framework’ without any development effort and no changes to your code.
  2. Support Matrix. Infery supports all deep learning frameworks.
  3. Installing TensorRT is hard. The most irritating thing while converting a model to TensorRT is the installation of different dependencies and broken environments while installing it. It takes a day or two to get to the correct installation with headaches. Infery’s installation makes this process as easy as possible, sometimes installing these drivers for you in a cross-platform solution, and reducing the installation and environment setup burden.

After selecting the Infery inference engine. We will see a pop like this. 

Infery engine instructions

Here you will find instructions on how to download the model and how to install the Infery library on the destination inference engine. 

Here is a reference for all the prerequisites for installation of the Infery library.

After meeting all the criteria you can install it by following the instructions mentioned then load the model and test it.

You can test it in any python console. Just feed your model instance with a numpy array and take a look at the outputs. The outputs will be represented as a list of np.ndarray objects.
You can choose to receive the outputs as a list of torch.cuda.Tensor objects by specifying output_device=’gpu’. This will keep the data on the GPU without copying it to the CPU unnecessarily.

Now you can benchmark the model using the benchmark function of Infery to see if all the metrics are as expected.

In this example, you can see that all the metrics are as expected from the Deci platform.


This article illustrates how you can speed up the process of converting a model from PyTorch to TensorRT with hassle-free installation as well as deploy it with a few simple lines of code using the Deci platform and the Infery inference engine. It’s faster, optimized, and has no computational cost. 

You May Also Like

Qualcomm Snapdragon Quantization

Qualcomm Snapdragon: Optimizing YOLO Performance with Advanced SNPE Quantization

The Ultimate Guide to LLM Evaluation 

Top Large Language Models Reshaping the Open-Source Arena

The latest deep learning insights, tips, and best practices delivered to your inbox.

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")