How to Scale Up Real-Time AI Video Analytics on the Edge

The advent of AI in general, and deep learning in particular, is enabling new ways of using and extracting insights from live video live streams or video footage.  Deep learning is enabling sophisticated and scalable analysis of high-resolution video to recognize scenes, activities, and even changes in sentiment.

AI-enabled video analytics applications are being deployed today across many different business verticals such as smart city, security cameras, healthcare, smart retail, and sports tech applications among others. At the core of these AI Video analytics applications are various computer vision applications performing tasks such as object detection, image classification, semantic segmentation, and face detection tasks.

Common Challenges in Deploying & Scaling AI-Enabled Video Analytics Solutions

While AI-based video analytics solutions are implemented in a wide variety of use cases, there are many shared challenges faced by developers across industries. 

Developers need to ensure that the computer vision models that power video analytics applications are accurate, cost-efficient to run, and can deliver real-time insights. In addition, many such applications should be deployed on small edge devices that have limited computing power. 

Let’s take a closer look at some of these main challenges and how leading AI teams are addressing them.

1. Achieving Real-time Inference Performance

Every video analytics application typically has its own performance requirements that must be met in order to enable the specific use case and deliver a good user experience. These translate into specific accuracy, latency and throughput targets the model should deliver.

In addition, AI developers should also take into account hardware-related constraints which can impact the size of the model and its memory footprint.

Developers often hit a barrier when trying to find the best trade-off between the model’s accuracy, inference speed, and model size.  Finding the right balance between these parameters on your inference hardware is a well-known challenge.  

Oftentimes, teams attempt to tackle difficult AI-enabled video analytics challenges by increasing the complexity and size of their deep neural models. In general, it is easier to achieve better accuracy with larger overparameterized neural models.  However, when considering accuracy versus inference time, we see that the more accurate architectures tend to incur higher inference latency which stands in contrast to the application requirements for real-time latency and throughput.

ImageNet Top-1 accuracy vs. throughput (images per second) of various architectures

Figure 1. ImageNet Top-1 accuracy vs. throughput (images per second) of various architectures.

Source: Bianco et al. Benchmark Analysis of Representative Deep Neural Network Architectures, IEEE. Reprinted with Permission.

This is true for any model, but in the context of edge devices, achieving this accuracy latency trade-off is crucial and can make or break the application.

2. Deploying your Models on Edge Devices

Video analytics can be run on the cloud or on edge devices, but developers are increasingly moving towards edge deployments. According to Gartner, by 2025, 50% of all inference will take place at the edge. The edge is preferable to the cloud for a variety of reasons including exorbitant cloud costs, privacy, and regulatory restrictions on sending the data to the cloud among others. 

Specifically for video analytics, there is also an issue of limited network bandwidth. The uplink bandwidth of a camera is limited, which in turn limits the ability to send high-quality video to the cloud. Video quality is essential for the analysis.

Moving your computing to edge devices enables you to process deep learning algorithms locally, either on the device that people use or on the server near it, and avoid the above-mentioned challenges.

Having said all of that, there are also challenges associated with running your video analytics inference on edge devices. Edge AI offers a lot of potential, but edge devices also have limited computing power and memory that may not be sufficient to support the performance you need. Or alternatively, your model size may be too large for the targeted edge device. Due to computing power constraints, most edge hardware today can only perform on-device inference with smaller models.

This, in turn, means constraints on the type of object detection model used. Beyond being highly efficient and having a small memory footprint, the architecture chosen for edge devices has to be thrifty when it comes to power consumption. This, along with the heat dissipation in some devices, can also limit the choice of model.

Here’s the thing: a network with reduced memory and compute power may simply be impractical for applications that require real-time processing. 

3. Cost-efficient Scaling at the Edge

Beyond getting models to run successfully in production, another issue is scalability.

Let’s say that you have succeeded in deploying your video analytics application at the edge. If you want to scale your application there are new challenges that you might encounter, especially if you are looking to do so in a cost-effective manner.

Many developers are looking to scale up while keeping hardware costs down. This can be done by processing multiple streams of high-resolution video on the same device. This is far more cost-efficient than having to process one or two streams per device. However, it adds an additional barrier that must be overcome to reach a commercially viable solution at scale.

You need to be able to achieve peak performance while running multiple streams on one device, something that is not achievable on every hardware. Developers need to find a way to maximize the utilization of their existing infrastructure and increase the number of streams that process data in real time on one device. 

A Hardware-Aware Solution to Scaling AI-Enabled Video Analytics 

Now that we’ve reviewed some of the root causes of these underlying challenges, let’s look at how to get ahead of them. Ultimately, the best way to tackle all of the challenges and to ensure that your model achieves SOTA results on your intended hardware is to identify the best neural network architecture for your use case, performance targets and hardware. One possible solution is the use of Neural Architecture Search (NAS). 

NAS is a technique that is used by leading AI teams to automatically search and discover the best neural networks for a given problem and set of constraints. It contains a hardware-aware model selection process that takes all of these parameters into account. NAS automates the designing of DNNs, ensuring higher performance and lower losses than manually designed architectures. It is also much faster than the traditional manual processes. The general idea behind NAS is to select the optimal architecture from a space of allowable architectures.

Deci’s proprietary NAS engine, called AutoNAC, is a hardware-aware technology that automatically generates best-in-class architectures tailored for your inference hardware and performance targets. AutoNAC performs a multi-constraints search to find the architecture that delivers the highest accuracy for any given dataset, speed (latency/throughput), model size and inference hardware targets.

A visual representation of AutoNAC engine

A visual representation of the AutoNAC engine and how it works. Source: Deci.

Architectures built with the AutoNAC engine achieve 3-5X better latency and throughput while maintaining and in many cases also improving the original accuracy.

AI-Enabled Video Analytics Case Studies

Deci’s customers are using the AutoNAC engine to scale up AI-enabled video analytics solutions across various verticals. Here are some of their stories:

Security Cameras Application

A customer needed to process high-resolution images for an object detection and tracking task on a NVIDIA Jetson Xavier NX device. In order for the system to become operational, the customer needed to run in a 10-watt mode and achieve 10 FPS. 

Using the AutoNAC engine, the customer built a customized architecture resulting in an increase of throughput by 3.1x and smooth object tracking was delivered. This allowed the customer to launch a new security application.

graph results security cameras application

Pedestrian Detection Application

In another case, a customer needed to run their object detection-based application in real time on a NVIDIA Jetson device. They wanted to connect multiple camera streams in order to scale their pedestrian detection application in a cost-efficient manner. With Deci’s AutoNAC engine the customer built an architecture that delivered a 1.5X increase in throughput reaching 96 Frames Per Seconds.

graph results pedestrian detection application

Scaling AI-Enabled Video Analytics on CPU

Another client’s ultimate goal was to maximize the efficiency of their existing infrastructure and double the number of video streams for the same hardware. However, their YOLOv5 model was not achieving their desired level of throughput (140 frames per seconds, inference batch of 1) on a server with a 20-core Cascade Lake CPU. The client needed to achieve at least 280 frames per seconds on the same server for the solution to be viable. 

By using Deci’s AutoNAC engine, the customer built an architecture that delivered a 2.4x acceleration with a throughput of 340 FPS, all while maintaining the same level of accuracy as the original model. This allowed the customer to double the number of video streams, increasing their scalability and profitability. 

By empowering teams to develop highly efficient models tailored for edge applications, Deci is helping companies to shorten the development lifecycle, deliver superior products to market, and scale their applications.

To find out how Deci can help you deploy and scale your AI-Enabled Video analytics solution, book a demo to see our AutoNAC engine in action. 

You May Also Like

Mastering LLM Adaptations: A Deep Dive into Full Fine-Tuning, PEFT, Prompt Engineering, and RAG

15 times Faster than Llama 2: Introducing DeciLM – NAS-Generated LLM with Variable GQA

Announcing Infery-LLM – An Inference SDK for LLM Deployment Redefining State-of-the-Art in LLM Inference

The latest deep learning insights, tips, and best practices delivered to your inbox.

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")