How to Scale Up Real-Time AI Video Analytics on the Edge

Editor’s note: ​​This post was originally published in July 2022 and has been updated for accuracy and comprehensiveness.

The advent of AI in general, and deep learning in particular, is enabling new ways of using and extracting insights from live video live streams or video footage.  Deep learning is enabling sophisticated and scalable analysis of high-resolution video to recognize scenes, activities, and even changes in sentiment.

AI-enabled video analytics applications are being deployed today across many different business verticals such as smart city, security cameras, healthcare, smart retail, and sports tech applications among others. At the core of these AI Video analytics applications are various computer vision applications performing tasks such as object detection, image classification, semantic segmentation, and face detection tasks.

Common Challenges in Deploying & Scaling AI-Enabled Video Analytics Solutions

While AI-based video analytics solutions are implemented in a wide variety of use cases, there are many shared challenges faced by developers across industries. 

Developers need to ensure that the computer vision models that power video analytics applications are accurate, cost-efficient to run, and can deliver real-time insights. In addition, many such applications should be deployed on small edge devices that have limited computing power.

Let’s take a closer look at some of these main challenges and how leading AI teams are addressing them.

1. Achieving Real-time Inference Performance in AI Video Analytics

Every video analytics application typically has its own performance requirements that must be met in order to enable the specific use case and deliver a good user experience. These translate into specific accuracy, latency and throughput targets the model should deliver.

AI developers face additional challenges with hardware constraints, which can impact the model’s size and memory utilization. These constraints necessitate careful consideration of the model’s architecture to ensure it fits within the available computational resources while still meeting performance goals.

Finding the right balance between these parameters on your inference hardware is a well-known challenge and developers often hit a barrier when trying to find the best trade-off between the model’s accuracy, inference speed, and model size.   

Oftentimes, teams attempt to tackle difficult AI video analytics challenges by increasing the complexity and size of their deep neural models. In general, it is easier to achieve better accuracy with larger overparameterized neural models. The pursuit of higher accuracy, however, often results in increased inference latency, contradicting the critical demand for real-time performance and high throughput in many video analytics applications. This is true for any model, but in the context of edge devices, achieving this accuracy latency trade-off is crucial and can make or break the application.

ImageNet Top-1 accuracy vs. throughput (images per second) of various architectures

Figure 1. ImageNet Top-1 accuracy vs. throughput (images per second) of various architectures.

Source: Bianco et al. Benchmark Analysis of Representative Deep Neural Network Architectures, IEEE. Reprinted with Permission.

2. Deploying your Models on Edge Devices

Video analytics can be run on the cloud or on edge devices, but developers are increasingly moving towards edge deployments. In 2022, Gartner predicted that by 2025, 50% of all inference will take place at the edge. With this prediction materializing, developers are leveraging the edge to address challenges such as high operational costs, data privacy concerns and the regulatory restrictions of sending the data to the cloud among others. 

The demand for edge computing video analytics is further intensified by the limitations of network bandwidth. High-quality video transmissions to the cloud are often restricted by the uplink capacity of cameras, making edge processing not just preferable but necessary for maintaining video integrity essential for accurate analysis.  

The transition to edge computing allows for the local processing of deep learning algorithms, circumventing bandwidth and privacy issues. However, edge computing has its own set of challenges, chiefly related to the computational and memory limitations of edge devices. 

Computational Constraints

Edge devices, designed for minimal power consumption and compactness, inherently posses restricted computational power. This limitation is particularly acute in the context of deep learning models for video analytics, which require substantial computational resources for tasks like object detection, classification, and tracking in real time.

  • Limited Processing Power: Unlike cloud servers equipped with GPUs capable of teraflops of processing power, edge devices often rely on less powerful processors. This gap results in slower processing times for complex calculations, directly impacting the ability to perform real-time analytics.
  • Thermal Management: High computational loads can generate significant heat. In constrained edge devices, managing this heat without adequate cooling mechanisms can lead to thermal throttling, where the device intentionally reduces its performance to lower temperature, further affecting processing capabilities.

Memory Constraints

Deep learning models, by their nature, are data and parameter-intensive, requiring significant amounts of memory for both the models themselves and the data being processed. Edge devices, however, are limited in both RAM and storage capacity.

  • Limited RAM: The available RAM on edge devices can be a bottleneck for running large deep learning models, as it limits the amount of data that can be processed at once.
  • Storage Capacity: Similarly, the finite storage capacity restricts the size of the models that can be stored locally, necessitating either model simplification or remote fetching of model parameters, which can introduce latency.

The critical challenge, then, lies in reconciling the computational and memory constraints of edge devices with the demands of real-time processing applications. While edge computing offers the promise of localized, efficient, and privacy-compliant analytics, the current limitations of edge hardware can render advanced neural networks impractical for real-time deployment. 

3. Cost-efficient Scaling at the Edge

Beyond successfully launching models into production, scalability emerges as another critical hurdle. Imagine you’ve managed to deploy your video analytics application at the edge. Scaling this application introduces a new set of challenges, particularly if the goal is to expand in a cost-effective manner.

Many developers aim to scale up while minimizing hardware expenses. A strategy to achieve this involves processing multiple high-resolution video streams on a single device, a method far more economical than dedicating a device to one or two streams. However, this approach introduces an additional obstacle that must be navigated to realize a scalable, commercially viable solution.

Achieving optimal performance when running multiple streams on a single device is essential, yet not feasible with every piece of hardware. Developers are tasked with finding ways to maximize the utilization of their existing infrastructure, enhancing the capacity to process multiple data streams in real time on a single device.

A Hardware-Aware Solution to Scaling AI Video Analytics 

Now that we’ve reviewed some of the root causes of these underlying challenges, let’s look at how to get ahead of them. Ultimately, the best way to tackle all of the challenges and to ensure that your model achieves SOTA results on your intended hardware is to identify the best neural network architecture for your use case, performance targets and hardware. One possible solution is the use of Neural Architecture Search (NAS). 

NAS is a technique that is used by leading AI teams to automatically search and discover the best neural networks for a given problem and set of constraints. It contains a hardware-aware model selection process that takes all of these parameters into account. NAS automates the designing of DNNs, ensuring higher performance and lower losses than manually designed architectures. It is also much faster than the traditional manual processes. The general idea behind NAS is to select the optimal architecture from a space of allowable architectures.

Deci’s proprietary NAS engine, called AutoNAC, is a hardware-aware technology that automatically generates best-in-class architectures tailored for your inference hardware and performance targets. AutoNAC performs a multi-constraints search to find the architecture that delivers the highest accuracy for any given dataset, speed (latency/throughput), model size and inference hardware targets.

A visual representation of AutoNAC engine

A visual representation of the AutoNAC engine and how it works. Source: Deci.

Architectures built with the AutoNAC engine achieve 3-5X better latency and throughput while maintaining and in many cases also improving the original accuracy.

AI Video Analytics Case Studies

Deci’s customers are using the AutoNAC engine to scale up AI-enabled video analytics solutions across various verticals. Here are some of their stories:

Cutting Costs and Enhancing UX with CPU-Optimized AI Video Analytics

Irisity, a leading AI video analytics software provider aimed to boost the performance of their object detection model. Their objectives were to increase throughput while maintaining accuracy, benefiting their customers by:

  • Reducing operational costs: Efficient scaling on existing CPU infrastructure lowers expenses
  • Improving user experience: Real-time insights and alerts elevate service quality.

Using the AutoNAC engine, Irisity’s team developed a new model, which significantly outperformed YOLOv7 tiny in throughput without compromising accuracy. After optimizing the new model with Deci’s Infery SDK, Irisity was able to achieve a 6.5x increase in throughput over the original YOLOv7 model.

As a result, their security application’s compatibility with various CPUs was expanded, making their product more scalable.

Using Deci’s platform reduced Irisity’s development time and minimized risks while ensuring data privacy.


Cutting Costs and enhancing UX with CPU optimized video analytics

Enhancing Real-Time Livestock Health Monitoring

An AI Software company specializing in livestock management sought to incorporate real-time accurate pose estimation of animals to detect health anomalies early. The company was employing the DEKR model which was not compilable to TensorRT FP16 and delivered low throughput on their target hardware – NVIDIA Jetson AGX Orin 32GB. This limited their application’s real-time analytics capabilities.  

Utilizing Deci’s AutoNAC for model development, the company was able to create a custom architecture for pose estimation that not only outperformed DEKR on the Jetson AGX Orin but was also more accurate. After compilation to TRT FP16 the new model delivered an 8.2x throughput increase alongside a 2.2% improvement in accuracy (AP score).

Enhancing real time live stock health monitoring ai video analytics on the edge

Maximizing Hardware Utilization for Security Applications 

A Fortune 500 company in the security sector sought to increase the number of video streams connected to its existing hardware. Their goal was to address the throughput limitations of their initial model, YOLOX small, which delivered a throughput of only 60 frames per second, limiting the number of connectable video streams.

Using Deci’s AutoNAC, the company developed a custom model that delivered a 50% boost in throughput and enhanced accuracy by 0.8% compared to YOLOX small. The improvement in throughput allowed for a 40% increase in the number of camera streams connected to a single Jetson Xavier device, significantly enhancing the application’s coverage without increasing costs. 

To find out how Deci can help you deploy and scale your AI video analytics solution, we invite you to talk to one of our experts

You May Also Like

Qualcomm Snapdragon Quantization

Qualcomm Snapdragon: Optimizing YOLO Performance with Advanced SNPE Quantization

The Ultimate Guide to LLM Evaluation 

Top Large Language Models Reshaping the Open-Source Arena

The latest deep learning insights, tips, and best practices delivered to your inbox.

Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")