27 Jul 2018
Deploying CV/DL/AI on Intrinsyc’s Open-Q™ 820 Development Kit
Artificial Intelligence is one of the breakthrough technologies in computer science. Image/voice recognition and face detection are challenging parts for the machine detection of objects. Most complex mobile Machine Deep Learning (DL) and Computer Vision (CV) tasks are currently performed in the cloud . Smart mobile devices send data to the cloud where it is processed and the results are returned back. However, the ability to perform machine learning tasks locally on your device, rather than remotely via the cloud, is becoming increasingly important for many reasons including latency and security. To help developers provide better machine learning-based enhancements, Qualcomm has introduced its Qualcomm Artificial Intelligence (AI) Platform, which is comprised of several hardware and software components to accelerate on-device AI-enabled user experiences on select Qualcomm Snapdragon mobile platforms. Intrinsyc Technologies designs and supplies a number of Qualcomm Snapdragon-based platforms with the ability to leverage AI capabilities, like Intrinsyc’s Open-Q™ 820 SOM and Development Kit.
Maximizing intelligence on client devices residing on the edge of the network ensures AI-powered user experiences can be realized with better overall performance and with or without a network connection. The key benefits of on-device AI include real-time responsiveness, improved privacy, and enhanced reliability. This flexible computing approach gives developers and OEMs the ability to optimize AI user experiences on intelligent edge devices: Smart IoT.
Machine Deep Learning (DL) consists of two distinct stages: training and inference . In the training stage, the Machine Learning algorithm is fed several examples (eg. photos or voice) along with the corresponding classification. Then, once trained, the Neural Network is used to classify new data. For example, the DL system might be trained with thousands of photos of dogs. In the inference stage, a new, previously unseen picture of a dog can be show to the system, and based on its training, will be able to recognize that the image contains a dog.
The inference stage works on almost any type of processing unit including CPUs, GPUs, DSPs and other Machine Learning Processors. The key difference between these processing units is efficiency, including how fast they can perform the inference and how much power they consume to do it.
Figure 1: Qualcomm AIE architecture
Key software-centric components of the Qualcomm AI Engine, include three components:
- Snapdragon Neural Processing Engine (NPE) software framework is designed to make it easy for developers to choose the optimal Snapdragon core for the desired user experience – Hexagon Vector Processor DSP, Adreno GPU and Kryo CPU – and accelerate their AI user experiences on device . The Snapdragon NPE supports the Tensorflow, Caffe and Caffe2 frameworks, in addition to the Open Neural Network Exchange (ONNX) interchange format, offering developers greater flexibility and choice on multiple Snapdragon platforms and operating systems.
- Support for the Android Neural Networks API, first released in Google’s Android Oreo, gives developers access to Snapdragon platforms directly through the Android operating system.
- Hexagon Neural Network (NN) library allows developers to run AI algorithms directly on the Hexagon Vector Processor. This provides optimized implementation for the fundamental machine learning blocks and significantly accelerates AI operations such as convolution, pooling, Fand activations.
The Qualcomm NPE for artificial intelligence (AI)  is designed to help developers run one or more neural network models trained in Caffe/Caffe2, ONNX, or TensorFlow on Snapdragon mobile platforms, whether that is the CPU, GPU or DSP.
To make the AI developer’s life easier, the Qualcomm Neural Processing SDK does not define yet another library of network layers; instead, it gives developers the freedom to design and train their networks using familiar frameworks, with Caffe/Caffe2, ONNX, and TensorFlow being supported at launch. Figure 2-1 is the development workflow .
Figure 2-1: Development workflow of using Qualcomm AIE
After designing and training, the model file needs to be converted into a “.dlc” (Deep Learning Container) file to be used by the Snapdragon NPE (SNPE) runtime. The conversion tool will output conversion statistics, including information about unsupported or non-accelerated layers, which the developer can use to adjust the design of the initial model. See Figure 2-2 for DL Framework and SNPE .
Figure 2-2 Deep Learning Framework and SNPE
After training is complete, the trained model is converted into a DLC file that can be loaded into the SNPE runtime. This DLC file can then be used to perform forward inference passes using one of the Snapdragon accelerated compute cores. The basic SNPE workflow consists of only a few steps:
- Convert the network model to a DLC file that can be loaded by SNPE.
- Optionally quantize the DLC file for running on the Hexagon DSP.
- Prepare input data for the model.
- Load and execute the model using SNPE runtime.
Figure 2-3 describes the entire SNPE workflow in the applications in various industries .
Figure 2-3 SNPE Workflow in Various Applications
Intrinsyc’s Open-Q™ 820 µSOM Includes the Snapdragon 820 SoC which contains Qualcomm’s Adreno 530 GPU, multiple Kryo CPUs and Hexagon 680 DSP supporting the Qualcomm AI Engine and software, including the Qualcomm Snapdragon Neural Processing Engine (NPE) SDK.
Figure 2-4 Open-Q™ 820 Deep Neural network Performance
With these capabilities, the SOM can support on-device AI which boasts analysis, optimization, and debugging capabilities that help developers and OEMs port trained networks into the platform. See Figure 3 for a photograph of the Open-Q™ 820 µSOM development kit, where we demonstrate the NPE on a previous blog. The AI Engine is compatible with TensorFlow, Caffe and Caffe2 frameworks, Open Neural Network Exchange interchange format, Android Neural Networks API, and the Qualcomm Hexagon Neural Network. These advantages can be used by app developers to provide “AI-powered user experiences” with or without a network connection.
Figure 3 Open-Q™ 820 µSOM Development Kit
Google released Tensorflow (TF), which made it easy to develop and deploy TF models in mobile and embedded devices. The TensorFlow runtime is a cross-platform library . Its architecture is compatible with Qualcomm’s AI engine  . Google currently uses TF in some of its own Android apps, such as Google Photos and Google Cloud Speech. TF was designed to run on processing units inside of processors, with the Snapdragon 820 having a more than capable design with its unique CPU, GPU, DSP. Therefore, the apps don’t have to task the CPU.
Google has created some example TF apps that can recognize real-world objects when placed in front of a smartphone camera. In this case, the app that used the TF framework and ran on the Hexagon DSP was able to recognize more objects at a faster rate than the same app that used the CPU to do the same tasks.
By taking an existing and pre-trained neural network in TF or Caffe/Caffe2 format and using the tools included with the Snapdragon Neural Processing Engine, it is possible to get feed-forward inference running on the GPU or DSP on Snapdragon chipsets. Here is example of that TF app running on a Snapdragon 820 that can demonstrate how the use of the DSP enhances object detection. See Figures 4 and 5 showing accurate object detection.
Figure 4 TF Classification using the TFLiteCameraDemo Application
Figure 5 Further examples of TFLiteCameraDemo application classifications
Figure 6 shows another example of application demonstration; the “Object Detection Machine Learning TF demonstration” application. In this application, DL is trained to detect multiple different objects in real time.
Figure 6 Object Detection Machine Learning TF demo
Per test results above, our Open-Q™ 820 µSOM provides a strong foundation for developer opportunities in embedded CV/DL/AI IoT devices. The process for the CV+DL is optimized due to the heterogeneous computing architecture of the Snapdragon SoC. In particular, the ability to use the DSP for learning at the edge is made possible by the Hexagon Vector Processor DSP, which allows highly optimized utilization of the SoC for improved performance.
Watch the following video demonstrating in real time the “Object Detection Machine Learning TF demonstration” application using the Open-Q™820 µSOM Development Kit.
The demonstration above shows our Open-Q™ 820 µSOM being able to run object detection in real time, accurately using DL/CV/AI technologies without any DSP or GPU specific knowledge or programming. These features allow for compelling on-device AI experience possibilities in diverse areas such as computer vision, audio, security, and gaming.
References: Shardul Brahmbhatt, “Using High Performance Vision Processing for IoT,” [Online]. Available: https://developer.qualcomm.com/blog/using-high-performance-vision-processing-iot.  Qualcomm Technologies, Inc., “Qualcomm Artificial Intelligence Engine Powers AI Capabilities of Snapdragon Mobile Platform,” [Online]. Available: https://www.qualcomm.com/news/releases/2018/02/21/qualcomm-artificial-intelligence-engine-powers-ai-capabilities-snapdragon.  Qualcomm Technologies, Inc., “Meet the high-performance engine that makes AI even smarter,” [Online]. Available: https://www.qualcomm.com/snapdragon/artificial-intelligence#meet-the-high-performance-engine-that-makes-ai-even-smarter.  Qualcomm Technologies, Inc., “Qualcomm Neural Processing SDK for AI,” [Online]. Available: https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk.  Qualcomm Technologies, Inc., “Snapdragon Neural Processing Engine SDK,” [Online]. Available: https://developer.qualcomm.com/sites/default/files/docs/snpe/  Qualcomm Technologies, Inc., “Snapdragon 820 Mobile Platform,” [Online]. Available: https://www.qualcomm.com/products/snapdragon/processors/820  Robin Reni, “Realtime Object and Face Detection in Android using Tensorflow Object Detection API,” [Online]. Available: https://www.skcript.com/svr/realtime-object-and-face-detection-in-android-using-tensorflow-object-detection-api/  “Android Demo App,” [Online]. Available: https://www.tensorflow.org/mobile/tflite/demo_android  “TensorFlow Architecture,” [Online]. Available: https://www.tensorflow.org/extend/architecture
Kevin Wang, Software Engineer