DhruvaGole/ProjectReport

=Proposal-Bela support for BBAI = Youtube Video About Student: Dhruva Gole Mentors: Giulio Moro, Stephen Arnold and Robert Manzke Code: My Fork of Bela and Official Bela Code Repository Wiki: https://forum.beagleboard.org/t/bela-support-for-bbai-later-ti-chips/29257/7 GSoC: GSoC entry

=Status= This project proposal has been accepted as part of GSoC 2021.

=Aim= This project proposes to provide restructuring and improvement of existing Bela Software Code to allow for compatibility and easier transition to newer Texas Instrument Sitara Processors (like the AM5729 in the BeagleBone AI).

About Student
College ID: 181030017 Github: https://github.com/DhruvaG2000 School: Veermata Jijabai Technological Inst. Country: India Primary language : English, Marathi, Hindi Typical work hours: 10AM - 7PM Indian Standard Time

About the project
Project name: Bela support for the BeagleBone AI My Blog: My blog related to this project and my findings. Logs : I maintain weekly progress updates here: https://dhruvag2000.github.io/Blog-GSoC21/logs/

Description
As given on the official website, Bela is a hardware and software system for creating beautiful interaction with sensors and sound. Bela consists of a Bela cape on top of a BeagleBone Black computer (until now). Bela has a lot of analog and digital inputs and outputs for hooking up sensors and controlling other devices, and most importantly Bela has stereo audio i/o allowing you to interact with the world of sound. Both Bela systems use the same Bela software. It uses a customized Debian distribution which - most notably - uses a Xenomai kernel instead of a stock kernel. Xenomai is co-kernel for Linux which allows to achieve hard real-time performance on Linux machines (http://xenomai.org/). It thus takes advantage of features of the BeagleBone computers and can achieve extremely fast audio and sensor processing times. Although the proposal Title mentions support for AI, I will try to develop a standardized setup that allows an easy jump across all TI chips.
 * What is Bela?

Bela is ideal for creating anything interactive that uses sensors and sound. So far, Bela has been used to create: and many more applications that are listed here
 * Applications of Bela:
 * 1) musical instruments and audio effects
 * 2) kinetic sculptures
 * 3) wearable devices
 * 4) interactive sound installations

BeagleBone Black is a low-cost, community-supported development platform for developers and hobbyists. Boot Linux in under 10 seconds and get started on development in less than 5 minutes with just a single USB cable. To know more visit https://beagleboard.org/black This is the board for which I was able to introduce Bela compatibility. Built on the proven BeagleBoard.org® open source Linux approach, BeagleBone® AI fills the gap between small SBCs and more powerful industrial computers. Based on the Texas Instruments AM5729, developers have access to the powerful SoC with the ease of BeagleBone® Black header and mechanical compatibility. BeagleBone® AI makes it easy to explore how artificial intelligence (AI) can be used in everyday life via the TI C66x digital-signal-processor (DSP) cores and embedded-vision-engine (EVE) cores supported through an optimized TIDL machine learning OpenCL API with pre-installed tools. Focused on everyday automation in industrial, commercial and home applications. To know more visit https://beagleboard.org/ai
 * The BeagleBone Black
 * The BeagleBone AI

The Beagle Black was launched over 7 years ago in 2013 and newer and better TI Sitara Processors have been launched ever since. It would be better to have a more standardized setup that allows an easier jump across TI chips. Soon, newer boards with different and more efficient chips like the AM5X and the TI C66x digital-signal-processor (DSP) cores in the BBAI are coming up that will need to be compatible with the Bela Software and Hardware. C, C++, PRU, dtb, GNU Make, ARM Assembly
 * Why add support for BBAI/newer TI chips?
 * Programming languages and tools to be used:

Implementation Details
The Bela cape is normally used in combination with the TI AM3358 processor present on the BeagleBone Black. The hardware was partially working on the BBAI using only ALSA(Advanced Linux Sound Architecture*) and the SPI driver. However, the Bela real-time code on ARM and PRU was not running on the BBAI yet.

Brief Summary:


 * 1) Created a device tree overlay using Cape Compatibility layer to port BB-BONE-AUDI which worked but had a few frequency issues on the BBAI. The Overlay I wrote has been accepted by BeagleBone maintainer Robert Nelson, and you can find it to here: https://github.com/beagleboard/BeagleBoard-DeviceTrees/pull/36
 * 2) Created a device tree overlay for the BELA Cape to work on the BBAI using the Cape Compatibility layer. It has been tested and is available on github: https://github.com/DhruvaG2000/BeagleBoard-DeviceTrees/blob/v4.19.x-ti-overlays/src/arm/overlays/BBAI-BELA-00A1.dts
 * 3) Adapted to the Bela PRU and ARM code and workflow to use the PRU using the Remote Processor Framework instead of the almost outdated UIO PRUSS.
 * 4) Updated the Bela code to use the McASP, GPIO and McSPI on the AM5729 SoC of the BBAI
 * 5) Install a Xenomai patched kernel and run the full Bela stack.
 * 6) I also ported a debugger for PRU called PRUDebug.

This project involved dealing with pinmuxing (using overlays), PRU assembly, C and C++ for Linux user space applications. Also studied the Technical Reference Manual for the Sitara family of SoCs. (AM5729 and the AM335x.

The remoteproc framework allows different platforms/architectures to control (power on, load firmware, power off) those remote processors while abstracting the hardware differences, so the entire driver doesn't need to be duplicated. In addition, this framework also adds rpmsg virtio devices for remote processors that supports this kind of communication. This way, platform-specific remoteproc drivers only need to provide a few low-level handlers Reference: https://www.kernel.org/doc/Documentation/remoteproc.txt
 * What is RProc?

Sometimes it is not convenient to describe an entire system with a single FDT(Flattened Device Tree). For example, processor modules that are plugged into one or more modules (a la the BeagleBone), or systems with an FPGA peripheral that is programmed after the system is booted. For these cases it is proposed to implement an overlay feature so that the initial device tree data can be modified by userspace at runtime by loading additional overlay FDTs that amend the original data. ([ ref.]) How is an overlay compiled? dtc (Device Tree Compiler) - converts between the human editable device tree source "dts" format and the compact device tree blob "dtb" representation usable by the kernel or assembler source. Once an overlay is compiled, it generates a .dtbo file which we can then use in the next stage.
 * What is a Device Tree Overlay?

How does one load a DT Overlay? It's simple, just edit the file /boot/uEnv.txt and then edit the following lines: enable_uboot_overlays=1

uboot_overlay_addr4=/lib/firmware/BBAI-AUDI-02-00A0.dtbo

and on the next boot, this new overlay should be loaded automatically.


 * Syntax Analysis

The places within the Bela core code that required intervention are:


 * 1) in the Makefile, updated the workflow to build the PRU code for remoteproc. Also implemented auto-detection of which processor the code was being compiled on which was passed as a compile time flag to the codes.
 * 2) Created PruManager code which combined RProc and UIO PRUSS(using the libprussdrv API) implementation all under one roof.
 * 3) in pru/pru_rtaudio.p, the hard-coded McASP, SPI and GPIO constants were replaced with board-dependent ones.

All these changes were made so that the same code base can run on all supported boards (e.g.: BBAI, BBB) with compile-time checks. To explain in short how I was able to establish a common code base for supporting all boards,

In the libraries (like bela_hw_settings.h), the constants like pin numbers were set accordingly and the rest of the code base became much easier to use without the need to hard code much. A programmable real-time unit (PRU) is a fast (200-MHz, 32-bit) processor with single-cycle I/O access to a number of the pins and full access to the internal memory and peripherals on the Sitara processors on BeagleBones.
 * PRU:


 * 1) The current Bela core code uses pasm to build the PRU assembly pru/pru_rtaudio.p and uses libprussdrv, which binds to the uio_pruss kernel driver to load the firmware to the PRU and handle access to the PRU RAM. Both pasm and libprussdrv are now deprecated, replaced by the clpru toolchain and remoteproc driver respectively.
 * 2) The PRU firmware contains hard-coded values for the address of the McASP, McSPI and GPIO peripherals. These addresses will change for the BBAI, so these constants were made conditional at compile time using ifdef 's.
 * 3) As the Bela PRU firmware is written in assembly for pasm, instead of rewriting or updating it in such a way that it will stop working on the current Bela images, we have used the workflow detailed below.

The workflow below works and has been tested on v4.19 BBAI and v4.14 BBB (+ BELA Cape). prudis $< | sed 's/^\(.*\)$$/" \1\\n"/' > $(RPROC_INCLUDED_ASSEMBLY) 4. have the following in a .c file: void main {     __asm__ __volatile__ ; } 5. build this .c file with the regular clpru toolchain clpru -fe $(RPROC_TMP_FILE).o $(RPROC_TEMPLATE) -v3 --endian=little --include_path=$(RPROC_BUILD_DIR) --include_path=$(RPROC_INCLUDE) --include_path=/usr/lib/ti/pru-software-support-package/include
 * Workflow for building the existing pasm PRU code with clpru:
 * 1) Built the .p file as is with pasm with -V2 -L -c -b flags. This generates a .bin file that contains the assembled program.
 * 2) I have used the disassembler Giulio Moro put together hacking the one that was inside prudebug. (Find it here. (Note: A disassembler is a computer program that translates machine language into assembly language).
 * 3) Process the bin through the disassembler and make it ready to be included inside an __asm__ directive (i.e.: add quotes and prepend a space at the beginning of each line):
 * 1) include "included_assembly.h"

6. The rest of the build procedure is implemented in the Makefile.

I also changed these addresses in 'pru/pru_rtaudio.p' :

... ... ...
 * 1) define MCASP_SRCTL6
 * 1) define MCASP0_BASE 0x48038000
 * 1) define MCASP_XBUF10			0x228
 * 1) define MCASP_RBUF10			0x2A8

and a few other one's like GPIO and SPI.

Additionally, I compared the McASP, McSPI and GPIO sections of the AM5729 and AM3358 manuals to verify that all the registers of the McASP and McSPI peripherals kept the same meaning and offsets between the two chips.

The following pin diagram(from this mathworks forum) aided greatly to help compare the pins on the BeagleBone black versus the BeagleBone AI.
 * PINMUXING



I have created a basic AUDIO overlay (named BBAI-AUDI-02-00A0.dts) that has been merged into the beagleboard / BeagleBoard-DeviceTrees repository. I have also created and tested the BBAI-BELA-00A1.dts overlay for the BELA cape.

Bela can work with a PRU->ARM interrupt, which is default these days, but requires an rtdm driver, which is another layer of complications. As an intermediate step to avoid further complications, I have managed to run it without the PRU->ARM interrupt by adding BELA_USE_DEFINE=BELA_USE_POLL to my make command line.
 * PRU->ARM INTERRUPTS

I initially believed that I needed to change the initialization code in PRU.cpp that is currently relying on libprussdrv and move to rproc. I was not sure if rproc provides some functionalities to access the PRU's RAM the way prussdrv_map_prumem used to, that essentially gives access to a previously mmap'ed area of memory. On the latest Bela code there's a Mmap class which can make this somehow simpler [http://docs.bela.io/classMmap.html ref. here]. I had to find the correct addresses in the AM5729 TRM to get Mmap working using class Mmap. As for the Rproc implementation, I wrote the entire PruManager.cpp and PruManager.h codes from scratch, taking advantage of OOPs features of the C++ Language. I program is structured as follows: Both these classes being inherited from the same abstract base class, have similar function names such as start, stop, getOwnMemory, and getSharedMemory.
 * Transitioning from libprussdrv to rproc
 * 1) There exists a virtual base class PruManager.
 * 2) PruManagerRprocMmap and PruManagerUio are child classes of the above.
 * 3) As their names suggest, PruManagerRprocMmap implements RProc + Mmap to control and PRU and access the Memory.
 * And, the class PruManagerUio basically uses the libprussdrv thus preserving the approach that was being used earlier. This class essentially enables backward compatibility.

Xenomai is a Free Software project in which engineers from a wide background collaborate to build a robust and resource-efficient real-time core for Linux© following the dual kernel approach, for applications with stringent latency requirements.
 * XENOMAI kernel

Xenomai kernel (v4.19.94-ti-xenomai-r64) has been built and tested for the BBAI. I have installed the xenomai kernel through the default procedure to update kernel and libraries which I have documented here. I have also managed to successfully build the entire Bela core code, also ran and tested a few examples on the BBAI.

The hardware listed below was necessary for testing if my code implementation works correctly on the hardware.
 * Hardware required:
 * 1) BeagleBone AI.
 * 2) Bela cape: The original Bela board.
 * 3) LA104 Logic Analyzer
 * 4) LEDs, jumper wires, Multimeter, etc.
 * 5) A Fan Cape

Experience
The following experience aided me to get started more systematically with this project,


 * 1) I have used C++, C and Python programming languages over the past 3 years in a variety of projects involving embedded systems using the ESP32, Arduino UNO, ESP8266 and am also well-versed with freeRTOS.
 * 2) I have an aptitude for writing good reports and blogs, and have written a small blog on how to use a debugger.
 * 3) I recently did a project using ESP32, in which I used the DHT11 sensor to display humidity and temperature on a local HTML server . Other than that I have worked on developing hardware and making documentation for a 3 DOF arm based on an ESP32 custom board.
 * 4) I also interned at an embedded device startup where I
 * 5) Interfaced ADS1115 ADC with the ESP32 and used it to read battery voltage.
 * 6) Used UART for ESP32 and SIMCOM SIM 7600IE communication to gain LTE support.
 * 7) Published local sensor data to the cloud via LTE.
 * 8) I actively contribute to open source (most recently, I contributed to the ADS1115 library for ESP32 on the unclerus repo and can be seen here).
 * 9) Currently I am working on designing a Development board for the Raspberry Pico (RP2040) using KiCAD.
 * 10) I also do a lot of mini projects throughout the year, you can find my several more interesting projects at my github page

Contingency
I believe that if I get stuck on my project and my mentor isn’t around, I will use the resources that are available to me. Some of those information portals are listed below.


 * 1) Derek Molloy's beagle bone guide provides all the information needed for getting up and running with my beagle.
 * 2) remoteproc
 * 3) BBAI vs BBB pin headers Google Sheet
 * 4) beaglebone pru gpio example
 * 5) official BELA website
 * 6) Ask on the BeagleBoard.org forum

Benefit
If successfully completed, this project will add support for the Bela cape + Xenomai + PRU on the BeagleBone AI, and also the code will be easier to port to other Texas Instruments systems-on-chip.

'' By going through the steps needed to have the Bela environment running on BBAI, we will go through refactoring and rationalization, using mainline drivers and APIs where possible. This will make Bela easier to maintain and to port to new platforms, benefiting the project's longevity and allowing it to expand its user base.'' -Giulio Moro

''Just ordered a Bela cape. The platform seems really cool and exactly what I was looking for :) Just thought I would add that, I would also be very interested in having a bit more processing power under the hood and the ai would definitely be enough for my purposes.'' - A User on Bela Forum

Misc
Completed all the requirements listed on the ideas page. The code for the cross-compilation task can be found here submitted through pull request #149.