03 Oct 2018

Artificial Neural Networks (ANN) on Snapdragon™-based Edge Devices

Introduction

Using Artificial Neural Network (ANN) on machines, to make them to reach similar capability to the human brain can do is a popular topic  and can be seen in different fields. Beside finding the best NN (Neural Network) and training database, another challenge is to implement it on embedded devices while optimizing performance and power efficiency. Using cloud computing is not an option always, especially when device doesn’t have connectivity. In that case, we need a platform that could do signal pre-processing and execute NN in real-time, with the lowest power consumption possible; especially when a device operates on a battery.

In previous blogs, which can be found on www.intrinsyc.com we went through few examples on how to use Qualcomm® Snapdragon™ Platforms for NN use cases. We saw that by using different tools (like python scripts), we can train a network with databases (in Caffe and Tensorflow) formats and then using the Qualcomm® Snapdragon™ Neural Processing Engine (NPE) software development kit (SDK)  to convert that network for Snapdragon platforms. In this blog, we mainly focus on using Matlab and the ONNX format.

Overview

Qualcomm® Snapdragon™ Platforms and the Qualcomm® Snapdragon™ Neural Processing Engine (NPE) software development kit (SDK) is an outstanding choice to create a customized neural network on low-power and small-footprint devices. The Snapdragon NPE was created to give developers the tools to easily migrate intelligence from the cloud to edge devices.

The Snapdragon NPE provides developers with software tools to accelerate deep neural network workloads on mobile and other edge Internet of Things (IoT) devices powered by Snapdragon processors. Developers can choose the optimal Snapdragon core for the desired user experience – Qualcomm® Kryo™ CPU, Qualcomm® Adreno™ GPU or Qualcomm® Hexagon™ DSP.

In this article, we explore developing and implementing NN on Snapdragon platforms using Matlab tools, and mainly focus on ONNX format. Also, we investigate how Snapdragon platforms can help us to reduce power and processing time, by using the optimal Snapdragon core, and tools that are provided by the SNPE SDK.

Design and Develop Simple DNN

We start by going through steps on designing and training a Deep Neural Network (DNN), using Matlab and port that design for Snapdragon and look for the best subsystem on Snapdragon to do the job.

Handwritten Digit Recognition System

Let’s start with handwritten digit recognition system using DNN. One of the major differences between this network and the (Audio Digit Recognition System) is that this system doesn’t have any pre-processing on input signal. Snapdragon platforms, with their heterogenous computing architecture, have powerful engines for audio and image processing using Digital Signal Processors (DSPs) and Graphics Processing Unit (GPU)..

For developing and training part of this network we use Matlab. This network is a three layer convolution-based network. We also use the handwritten digits database that comes with Matlab (It is the same as MNIST database. for the source of that database, please check Matlab documentation)

So, let’s check the script

  • Here we select database

[XTrain,YTrain] = digitTrain4DArrayData;

[XValidation,YValidation] = digitTest4DArrayData;

  • Now setting the layers

layers = [    imageInputLayer([28 28 1],’Name’,’input’, ‘Normalization’, ‘none’)

convolution2dLayer(5,16,’Padding’,’same’,’Name’,’conv_1′)

batchNormalizationLayer(‘Name’,’BN_1′)

reluLayer(‘Name’,’relu_1′)

convolution2dLayer(3,32,’Padding’,’same’,’Name’,’conv_2′)

batchNormalizationLayer(‘Name’,’BN_2′)

reluLayer(‘Name’,’relu_2′)

fullyConnectedLayer(10,’Name’,’fc’)

softmaxLayer(‘Name’,’softmax’)

classificationLayer(‘Name’,’classOutput’)];

  • and create the network

options = trainingOptions(‘sgdm’,…

‘MaxEpochs’,6,…

‘Shuffle’,’every-epoch’,…

‘ValidationData’,{XValidation,YValidation},…

‘ValidationFrequency’,20,…

‘Verbose’,false,…

‘Plots’,’training-progress’);

  • And train it (for detail on training process please check Matlab documentation)

Fig 4 - Training result

Fig 6 – Testing result

Now after converting the network to ONNX format, we move to the next step, which is using the SNPE tools

First, we need to convert the ONNX format to DLC.

snpe-onnx-to-dlc -m handwritten-onnx –debug

This will create a DLC format network that can be used for SNPE.

Then using this command, you can verify if the network structure matches with what we created in Matlab

Fig 7 - Topology Comparison (left side SNPE DLC, right side Matlab)

Fig 7 – Topology Comparison (left side SNPE DLC, right side Matlab)

Now we use same testing images and verify it on the Snapdragon target. Here are the results summary for ARM, cDSP and GPU, using these steps

  • Pull the result of snpe-net-run on the platform for different cores (–use_dsp and use_gpu)
  • Run snpe-diagview on host machine against the pulled result

Comparing results, shows that DSP and GPU are close, but on these platforms

  • cDSP has no load compared to the GPU (especially if there is a graphical application running). It is kind of dedicated processing for this type of processing.

Using Subsystems for Signal Pre-processing

So far, the DNN network that we have implemented doesn’t need any pre-processing on input signal (like feature extraction from input images). However, this is not the case for all implementations.

For those situations and to achieve lower power consumption, we can use different subsystems on Snapdragon – aDSP, mDSP, cDSP, GPU, DSP/HVX, ARM/NEON. Let look at xDSP and examples on how we can use those processors for feature extracting.

Hexagon xDSP on Snapdragon

Hexagon DSP is a multi-thread DSP with L1/2 cache and memory management unit and on most Snapdragon SOCs, it has same access to few resources as other cores have. This unique structure beside QuRT OS creates a flexible DSP platform to create applications for different use cases.

Using Subsystems for Signal Pre-processing

So far, the DNN network that we have implemented doesn’t need any pre-processing on input signal (like feature extraction from input images). However, this is not the case for all implementations.

For those situations and to achieve lower power consumption, we can use different subsystems on Snapdragon – aDSP, mDSP, cDSP, GPU, DSP/HVX, ARM/NEON. Let look at xDSP and examples on how we can use those processors for feature extracting.

Hexagon xDSP on Snapdragon

Hexagon DSP is a multi-thread DSP with L1/2 cache and memory management unit and on most Snapdragon SOCs, it has same access to few resources as other cores have. This unique structure beside QuRT OS creates a flexible DSP platform to create applications for different use cases.

DSP Hardware Architecture

 

Fig 1 – DSP Hardware Architecture

Image Processing

For real-time image processing, you can inject a customized HVX module in the ISP pipeline. The location in pipeline for this module can be different and it depends on the Snapdragon series. In some platforms, you can have it after camera sensor interface module

Fig 2. Post processing

Fig 2. Post processing

Or in others, you can inject HVX module in a different location of camera pipeline (red dots)

Fig 3. HVX tap points

Fig 3. HVX tap points

Or it can be used for memory-to-memory transferring after the ISP. There are a few examples that are available in Hexagon SDK 3.3.

As an example, a Sobel processing on a noisy 640×480 image using HVX, can take around 10K PCycles.

Fig 4 - Sobel processing for noisy image

Fig 4 – Sobel processing for noisy image

Audio Processing

For audio preprocessing, aDSP and its Elite framework is suitable to do feature extraction in real-time. On DNN network for digit recognition system, the input of the network will be a Mel-frequency cepstral coefficients (MFCC), using one-second audio files and 14 coefficients, the input layer will be 14×98. The database is collected from https://aiyprojects.withgoogle.com/open_speech_recording and using 1500 audio file for each digit (0-9). Here is an example of MFCC for digit one.

 

Fig 5 - MFCC for digit one

Fig 5 – MFCC for digit one

The network is configured as

 

Fig 6 - DNN for digit

Fig 6 – DNN for digit

DNN will try to learn and classify these types of images for different digits. The feature extraction part is done in aDSP as a customized module in audio path topology in Elite framework.

Fig 7 - Comparing result

Fig 7 – Comparing result just for DNN processing

Sensor Processing

Snapdragon platforms contain a sensor hub, the Snapdragon Sensor Core, that helps to integrate data from different sensors and process them.  This technology can help off-load these tasks from the central processor; reducing battery consumption, while also providing improved performance. The pre-processing of any sensor information for any DNN that is targeted for sensor behavior recognition, can be off-loaded to the DSP and can be done in real-time.

In all above cases, instead of using assigned DSP for input, you can offload processing from ARM to any other subsystem (like mDSP), using FastRPC, but this technique has its own processing overhead.

Summary

Qualcomm® Snapdragon™ Platforms and the Qualcomm® Snapdragon™ Neural Processing Engine (NPE) software development kit (SDK) provide powerful platforms and tools to create a customized artificial neural network on low-power and small-footprint edge devices.


Comments are closed.

Show Buttons
Hide Buttons

Job Id*

Your Name*

Your Email*

How did you find us?*

Location*

Resume