BeagleBoard/GSoC/2021 Proposal/TensorFlow Lite Compatibility with BeagleBone AI

=TensorFlow Lite Compatibility and Use with BeagleBone AI=

About Student: Leah Pillsbury Mentors: Robert Nelson, Stephen Arnold, Jason Kridner, Deepak Khatri Code: TensorFlow Lite Support, BBAI Firmware, TensorFlow Lite Examples for TIDL Wiki: Proposal/TensorFlow Lite Compatibility with BeagleBone AI GSoC: N/A

=Status= I'm working on this project, summer of 2021.

About you
IRC: lpillsbury Github: lpillsbury School: Pasadena City College (Boston University prior to PCC) Country: United States Primary language : English Other languages : Spanish, Swahili, Hebrew, some Hindi/Urdu, Telugu, Bengali Typical work hours : 9AM-5:30PM US Pacific, though I may be on Eastern time part of the summer Previous GSoC participation: First time participating in GSOC. I am interested in writing lots of code, learning more about embedded software, and participating in the community. I use open source tools regularly, and it is exciting to make something useful that other people would use too.

About your project
Project name: TensorFlow Lite Compatibility and Use with BeagleBone AI

Description
Designed to be faster and lightweight for microcontrollers and phones, TensorFlow Lite is supported by Raspberry Pi, Arduino, ESP32, multiple Adafruit Boards, SparkFun, STM32, Android, and iOS. No BeagleBoard is on this list, and it should be. I first realized this was a problem when I decided I wanted to learn more about AI on the edge. I searched for information on this topic and immediately found that TensorFlow Lite is one of the industry standards with a wide variety of other boards that it plays well with.

The BeagleBone AI is built with a Texas Instruments Sitara AM5729 processor that contains 2x dual Arm® Cortex®-M4 co-processors (Arm M4 processors are the processors of choice with TensorFlow Lite), as well as DSP, Vision Accelerator Pack, and GPU. Texas Instruments documentation states that "TensorFlow Lite runs on Arm for Sitara devices (AM3/AM4/AM5/AM6). For AM5729 and AM5749 devices, Tensorflow Lite heterogeneous execution is supported by utilizing TIDL compute offload with EVEs and DSPs." Processor SDK Linux

Although the BeagleBone AI has impressive specs and capabilities to tackle AI tasks, the integration with commercial machine learning frameworks stands to be improved. While a developer with Linux familiarity and significant patience can get TensorFlow Lite set up on BeagleBone AI, what about the general public? Many scientists and engineers who want to use AI features for checking up on a factory floor, evaluating environmental patterns, or monitoring assets know some programming and machine learning, but may not be embedded Linux experts.

After reading multiple accounts of frustrating installation attempts, I began this proposal thinking that I would need to write a sophisticated patch to port TensorFlow Lite to the BeagleBone AI. My additional research now suggests that this won't be necessary. I now believe it is more likely I'll need to do a combination of a wrapper on parts of the current Processor SDK Linux, and make clear directions and examples of cross-compilation on Arm. Even if installing and running TensorFlow Lite on BeagleBone AI turns out to be more straightforward than expected, improving TensorFlow Lite compatibility is still a worthwhile project. The more universal the integration, the the more users will look to the BeagleBone AI as a first choice for AI applications.

This project will include a combination of C coding, Linux kernel, and some example cases with TensorFlow Lite in python and C++. Before coding officially begins in June, I'd like to experiment with TensorFlow Lite on Raspberry Pi and Arduino so that I understand the level of complexity in instructions, installation, and running in the other examples on the TensorFlow Lite website. Then when the summer starts, I'd start by getting TensorFlow Lite working on a BBAI, document my hacking processes, and then create a smooth stable way to make it work more out of the box. After trying the preloaded examples in the developer SDK, I'd train and deploy some new models that are specifically useful to automation settings. I would like to create a polished product to highlight both on the BeagleBoard website, and also on the TensorFlow Lite Examples page by making a pull request to the owners of the github repo.

I have several ideas for models and use cases such as sound anomaly detection in manufacturing, object recognition, and recognizing deviations in environment norms. Given that there are already examples of using TensorFlow Lite on Raspberry Pi with picamera, my first order starting point would be to have equivalent BeagleBone examples, most likely doing image capture through OpenCV (OpenCV with BeagleBone Black).

Getting TensorFlow Lite Working on BBAI
There is an outstanding question as to how hard it will be just to get TensorFlow Lite working on the BeagleBone AI board. I know that: There are bits out there, but rcn-er hasn’t integrated them.
 * Nothing that sounds easy rarely is on an embedded device
 * Many people have struggled to get TensorFlow Lite working on BBAI. Some of these have had to switch platforms when they couldn't get it working (See advice to one user in Benefits)
 * Communication with @jkridner suggests that the issue still needs to be dealt with:

If you're talking about the native building (building BeagleBone binary on BeagleBone), you need to make sure you have enough RAM installed. As I said, 1G ram + swap could work but it would be slow.
 * Communication with Terry (Woncheol) Heo at TensorFlow Lite suggests that getting TensorFlow Lite working on BeagleBone AI should be relatively simple, if it is first cross compiled on another machine. He said:

That's why I recommend using cross compilation. I don't think we have a known issue for ARM cross compilation now.

FYI, CMake support was added recently. For ARM cross compilation with CMake. you may want to check the following page. https://www.tensorflow.org/lite/guide/build_cmake_arm I think ARMv7 NEON (armhf) binary will work nicely with BeagleBone AI. But please let me know if you have any issues with it.

Our TFLite implementation of custom-op is a basic one, and frankly limited tested
 * Texas Instruments documentation says that TensorFlow Lite works on the BeagleBone AI processor, but in one forum, it also took a user many iterations with a TI Mastermind to get it working. Even though she helped the user get it working, Paula Carrillo stated that:

Given this information, I expect that it will take some time to track down what has been tried before and try a few different options to develop a recommended procedure for using TensorFlow Lite on BeagleBone AI. I also don't anticipate it being impossible, given the support for the TI Sitara processor. If it is so easy that I have extra time, I have many additional ideas to fill the project period as listed in the section Potential Additional Deliverables.

Simplifying the TensorFlow Lite Setup for Others

 * 1) On the most fundamental level, I plan to document how I cross compile and setup TensorFlow Lite for the rest of the community.
 * 2) I have a Linux computer on which to cross compile. Probably some of the engineers who use BBAI will also have access to Linux and know about cross compiling, while some will not.  Many education and hobby users and even engineers may not use Linux on other machines. Therefore, it would be useful to provide a way that these users can also cross compile a binary for using on BBAI. I have two thoughts on this:
 * 3) Create a Bash script to run on a remote Linux server that cross compiles for their target.  In such an application, a user could specify their target and the version of the program they want to cross compile, and then download a binary ready for running on their target.  This is only a good idea if such a server already exists; it is unlikely to be worth it to create something for this project.
 * 4) Use a remote IDE that has access to gdb. Here are some examples of others doing that:, ,
 * 5) My initial idea was to create a patch so that TensorFlow Lite can be installed natively on the BBAI. After getting a cross compilation solution to work, I will explore this option with @RobertCNelson.

AI Examples Using TensorFlow Lite
Vision
 * Object Detection (some object detection datasets)
 * This is useful for robotics, self driving vehicles, even machinery.
 * Anomaly Detection (I'm very interested in tasks like this)
 * Detecting small anomalies is necessary for manufacturing applications like quality control for solder joints.

Sound
 * Pattern Identification (example sample dataset)
 * This is useful in manufacturing and lab settings where machine health can be monitored by changes in sound quality unique to each machine.

Potential Additional Deliverables
One of the hardest parts of executing an engineering project for me is knowing in advance how long something will take. Sometimes seemingly trivial tasks become complicated, and things I thought were going to be hard prove to have a simple solution. I have proposed what I believe to be an achievable amount of work for a summer project given that there are many tasks that are still undefined, and new skills I will learn along the way. Should using TensorFlow Lite be more straightforward than I expect, there are many potential additional directions for this project. These are:
 * More TensorFlow Lite examples for users to download and use.
 * TensorFlow Lite integration with other boards such as the BeagleBone-X-15
 * Similar integration with additional machine learning tools. Top of my list are:
 * PyTorch (because it is widely and I have expertise)
 * mlpack (it's in C++, and it's also a GSOC participating organization looking for more examples on embedded devices)

Important Outstanding Questions
There are several processors in the AM5729 and I don’t know how the chip delegates work between them or if this is something that needs to be explicitly specified in the build.
 * How will the Linux Kernel will communicate with the ARM Cortex M4 processor of BBAI?



TensorFlow Lite documentation says: TensorFlow Lite for Microcontrollers is written in C++ 11 and requires a 32-bit platform. It has been tested extensively with many processors based on the Arm Cortex-M Series architecture, and has been ported to other architectures including ESP32. The framework is available as an Arduino library. It can also generate projects for development environments such as Mbed. It is open source and can be included in any C++ 11 project.

Info from Texas Instruments on Deep Learning: 3.15.1.4.4.1. Firmware OpenCL firmware includes pre-canned DSP TIDL Lib (with hard-coded kernels) and EVE TIDL Lib following Custom Accelerator model. OpenCL firmware is downloaded to DSP and M4/EVE immediately after Linux boot:

dra7-ipu1-fw.xem4 -> /lib/firmware/dra7-ipu1-fw.xem4.opencl-monitor dra7-dsp1-fw.xe66 -> /lib/firmware/dra7-dsp1-fw.xe66.opencl-monitor dra7-dsp2-fw.xe66 -> /lib/firmware/dra7-dsp2-fw.xe66.opencl-monitor


 * What is the best way to make TensorFlow Lite accessible to BBAI users?
 * This goes back to the question of cross compilation vs SDK Linux vs other potentially more user friendly options.
 * The other options I suggest are only worthwhile if they are easily maintainable and/or actually easier.


 * How fast does TensorFlow Lite perform on BBAI?
 * Another GSOC project mentions the latency of running YOLO on BeagleBone Black
 * Others have also mentioned that BeagleBone Black compares poorly to its peers for AI at 97.03 ms per 1000 runs
 * Given the BBAI architecture, TensorFlow Lite and other machine learning tools should run much faster on BBAI. I don't want to make this into a benchmarking project, but some statistics about speed on BBAI would be quite useful to users (especially if it's much faster than BeagleBone Black)
 * I've heard that python is a lot slower than C++. TensorFlow Lite supports both. I'd love to try the same model in each language and see if there is a big difference in performance time on BBAI.


 * Can models be trained on the BBAI?
 * Documentation and examples I have seen have been for pretrained models. It is worth attempting to train while on BBAI and documenting results. I imagine it will be slower, but if it is effective, it may still be useful.

Hardware Required

 * 1) BeagleBone AI
 * 2) Power Cable
 * 3) External Fan (many blogs say that even though the BBAI runs hot, this is necessary: one example A simple start
 * 4) Serial FTDI Cable
 * 5) External Microphone (such as BeagleMic)
 * 6) Camera (I have a Logitech USB webcam I could use, but for showing off what BBAI can do with images I'd probably choose the HD Camera Cape)
 * 7) (if time or the scope of the project allows: BeagleBone-X-15)

Timeline
Prework: Get TensorFlow Lite examples working on the Arduino and Raspberry Pi that I already have. Learn more about the Linux kernel and how programs get installed natively vs run from cross compiled binaries. Read more on the internet about what AI packages people have used successfully and unsuccessfully on BBAI. This knowledge will be useful in comparing TensorFlow Lite performance with that of other tools. Also, if there are other "must have" packages that the BBAI does not support, I will add them to the scope of my project if time allows.

Experience and approach

 * I have experience programming in python and C/C++, both of which I'll need for this project.
 * I have used Arduinos, Fubarino Mini, and Raspberry Pi at work (at a swarm robotics company), and for fun/school projects. At the swarm robotics company, we also made a vehicle communications device that used an STM32 microcontroller.
 * I've switched to using Linux almost exclusively so that I can be able to work with embedded devices more readily.
 * I took machine learning courses at school, and am familiar with many of the main algorithms. So far, I've mainly used Matlab and PyTorch. While I am less familiar with TensorFlow, I expect to be able to pick it up quickly given my understanding of the algorithms.
 * I also have experience collaborating with teams around the world and I enjoy learning from everyone.
 * I've successfully designed and implemented software projects.

Contingency
All of my managers have described me as tenacious. Since becoming an engineer, I've successfully completed many projects that I had to figure out how to solve all of the sub-tasks independently while only having an initial understanding of the general ark of the problem. Generally I try to understand the overall picture and then see my way through the piece I'm working on and a few steps ahead. At one internship, I made a python application to do complex modeling of water recapture from cooling towers. When I started I didn't know all of the python features I ended up using, how to use them, or some of the mathematical concepts in the model. I am also good at asking my peers for help in addition to outside resources. In making this application, I contacted people within BeagleBoard IRC and GSOC chat room as well as on the TensorFlow Lite Google Group.

Benefit
BeagleBoard options are strong contenders with Raspberry Pi and Arduino as tools that are easy to use to do IoT and industrial projects. While BeagleBones are geared more towards engineers and have some special features that actually make them quite different than the others (PRUs, ability to do low level and high level control simultaneously, power usage etc), they often get compared with the other frameworks. Given that Raspberry Pi and Arduino have TensorFlow Lite compatibility, I think it is important for BeagleBones to do the same. Also, there are some algorithms that are easier to access with TensorFlow Lite than in other settings.

For example, I tried tackling YOLO deployment on BBAI but it is currently impossible because of TIDL library restrictions. TFLite is the way to go with DL on BBAI and it uses TIDL underneath
 * -Jakub Duchniewicz

In large laboratories and plants, experts often rely on the “sound” of machines to know quickly that a system is working properly. And most often small changes in pulsing tones or anomalies in the sound environment keys us into identifying problems. Much of this can be automated with machine learning with environmental monitoring sensors. I'd like to train a system to recognize when my helium reliquifiers or pulse tube coolers aren't functioning properly.
 * -Ritoban Basu Thakur, Caltech physicist (See Section 9.2 on audio monitoring for cryogenic equipment)

and several frustrated users have posted on the internet:

I've been attempting to build the TF Lite library on my BeagleBone AI (32-bit ARMv7 MCU) for several days now, to no avail.
 * -Matt posted to TFLite Google-Groups

I wish to run my network on the AM5729 found on the BeagleBone AI board. I understand that the ti-processor-sdk-linux-am57xx-06.03.00.106 does not yet support the BeagleBone AI tool. However, the Debian image distributed for the BeagleBone AI has all the TIDL libraries packaged with it.

Therefore the only thing I need to be able to do is convert the tensorflow lite network into the TIDL format using the TIDL import tool. I assume I can do this using the following flow: 1. download the ti-processor-sdk-linux-am57xx-06.03.00.106-linux-x86-Install.bin from TI 2. run the installer on a linux (Ubuntu) host 3. navigate to the tidl_model_import.out tool 4. run the tidl_model_import.out tool on an appropriate configuration file (below)
 * -Alex Beasley posted on TI forum

The error is because TIDL does not support importing tensorflow lite model. Please try with Caffe/tensoflow/onnx models, for more details refer to TIDL datasheet/user-guide.
 * -Praveen responded on TI forum

In addition to the requested need for TensorFlow Lite, this is something that BeagleBoard has already promised: We've got a path to deliver the best AI training platform available, but, the actual materials are deeply lacking today. The https://github.com/beagleboard/cloud9-examples repository has a starting point for using the TIDL library, but that library is based on C/C++ and requires separate information on training and converting the model from Tensorflow/Caffe/etc. The TI Tensorflow Lite support is coming around the first quarter of 2020.

So, this is a work in progress. We'll be integrating a ton of examples (and associated software tools) to work with stuff like Tensorflow and Coral Accelerator in the short term. The expectation is to both crowd source this around the cloud9-examples repository and grow this over time. The hope is the Cloud9 development environment and mjpg-streamer presentation layer is enough inspiration and platform to get this moving quickly.

Today, conversion from Tensorflow or Caffe models to TIDL such that you can run them accelerated is covered at http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_TIDL.html. You can also choose to run Tensorflow or Caffe natively on ARM.
 * -BBAI FAQ

Misc
I've completed the other requirements listed on the wiki. Link to cross compilation pull request

Suggestions
I like the collaborative setup of the chatroom and the IRC. It was a good way to learn more about the platform and my peers.