BeagleBoard/GSoC/BeagleBone PRU DMA

=BeagleBone PRU DMA=

Most of existing PRU applications utilize (waste) one PRU core for data transfer. The goal of this project is to enable usage of EDMA controller for copying of data to and from main memory (DDR), which would allow applications to use both cores for computation.

Student: Maciej Sobkowski Mentors: Kumar Abhishek (Abhishek_), Zubeen Tolani (ZeekHuge) Code: https://github.com/maciejjo/beaglebone-pru-dma Wiki: http://elinux.org/BeagleBoard/GSoC/BeagleBone_PRU_DMA GSoC: https://summerofcode.withgoogle.com/projects/5021339281784832

Status
This project has been accepted for GSoC 2017.

About you
IRC: maciejjo Github: maciejjo School: Poznań University of Technology Country: Poland Primary language English Typical work hours 8AM-5PM CEST/UTC+2

About your project
Project name: BeagleBone PRU DMA transfer support

Description
BeagleBone's SoC, TI Sitara AM3358, contains (besides main ARM Cortex-A core) two additional processing units called PRU (Programmable Realtime Unit). These cores' main purpose is running code subject to a real-time constraint. Such approach lets the user run a non-realtime operating system on the main ARM core and offload parts of application which require realtime capabilities to PRU cores.

Current PRU applications requiring data exchange with Linux (i.e. Cortex-A8) dedicate one PRU core for this task, leaving only one for actual application purposes. This project aims to use AM3358's built-in DMA controller to handle data transfers between PRU memory and system DRAM, effectivly freeing the second core to be used for other purposes.

PRU
PRU and related peripherals form PRU-ICSS - Industrial Communication Subsystem (also called PRUSSv2 becasue it replaced older PRU subsystem revision called PRUSS). It consists of two 32-bit RISC cores (PRUs), shared data and instruction memories, internal peripherals and interrupt controller. Details of PRU-ICSS can be found on the block diagram.



Features of PRU-ICSS:


 * 2 PRU cores, each featuring:
 * 200MHz clock
 * 8KB program memory, parity detection
 * 8KB data memory, parity detection
 * OCP Master port
 * GPIO
 * Scratchpad shared memory (SPAD)
 * 12KB shared RAM
 * Interrupt controller (INTC)
 * UART

In Linux kernel there are two mutually exclusive ways to use PRU. One is older uio_pruss driver based on UIO framework, and the other is newer set of drivers (pru_rproc and rpmsg_pru) which make use of newer remoteproc (for controlling the processor) and rpmsg (for exchanging messages between Linux and processor) frameworks.

DMA - EDMA controller
EDMA is general purpose DMA controller inegrated in AM335x SoC. It supports up to 64 DMA channels. It consists of two main blocks: channel controller (TPCC), and three transfer controllers (TPTC). Channel controller is the programmer inteface to EDMA. It prioritizes transfer requests and events from peripherals, and schedules tranfser requests to tranfser controllers. EDMA is supported in mainline Linux kernel by a standard dmaengine framework driver.

DMAengine API
Linux provides an uniform API for device drivers that want to make use of DMA transfers. It is called dmaengine and EDMA controller driver provides support to make use of this interface.

Main Goals
First conception for the project was to every part of DMA transfer in kernel module (as depicted on the included diagram), but as suggested in discussion on #beagle IRC channel, this approach would not be efficient. It was suggested that DMA transfers should be scheduled by PRU and Linux should only perform DMA channel setup. The reasoning is that pacing of transfers should be done by PRU, so the transfers are not done faster than PRU can handle.

List of goals specified for this project:


 * 1) Create Linux driver capable of configuring DMA transfers
 * 2) Prepare PRU application that will schedule DMA transfers by writing to EDMA trigger register
 * 3) Set up interrupt from PRU to A8 and IRQ handler in driver code
 * 4) Set up configuring DMA channel in Linux
 * 5) Set up handling transfers on Linux side in IRQ handler
 * 6) Add support for Device Tree configuration of DMA channels for the driver
 * 7) Provide example PRU applications making use of created DMA capabilities
 * 8) Measure speed of provided interface
 * 9) Create documentation for both PRU and kernel parts of the project

PRU/A8/EDMA Communication Diagram
Diagram of proposed communication between HW and SW componnents used in the project (first concept):



Second concept (where PRU is responsible for triggering transfers):



Timeline
Before first week
 * create project repository on GitHub
 * prepare BeagleBone for development
 * collect required documentation
 * prepare develomplent environment (tools for PRU, linux development and debugging)

2017-06-06: Milestone #1 2017-06-13: Milestone #2 2017-06-20: Milestone #3 2017-06-27: Milestone #4 2017-07-04: Milestone #5 2017-07-11: Milestone #6 2017-07-18: Milestone #7 2017-07-25: Milestone #8 2017-08-01: Milestone #9 2017-08-08: Milestone #10 2017-08-15: Milestone #11
 * create pru_dma Linux driver that will handle DMA transfers on Linux side
 * test setting up DMA channels in the driver
 * set up and test interrupt communication between A8 and PRU
 * start working on DMA transfers on PRU side
 * set up handling of interrupt from A8 in kernel module
 * ready bi-directional DMA transfers scheduled from PRU
 * generate interrupt on A8 on finished DMA transfer
 * work on PRU/Linux communication (synchronization of transfers to avoid data loss on either side)
 * finalize work on PRU DMA transfers code
 * For first evaluation period: Working PoC, bidirectional transfers scheduled from PRU and communication with Linux driver.
 * test and debug DMA transfers (both directions)
 * start mearusing performance of the transfers
 * clean-up of kernel driver code so it conforms to standards
 * start working on example applications of the DMA interface
 * continue work on examples
 * Bugfixes
 * Finalize work on example applications
 * For second evaluation period: Finished DMA transfers API, example applications ready
 * Start work on project documentation
 * Further work on documentation
 * Clean-up for final release of the project
 * Collect feedback from community
 * Leave last week as buffer for delays in other tasks

Strech goals
 * Add support for old PRU driver (uio-pruss)
 * Develop userspace interface for the driver so data can be fed and received by userspace application

Experience and approach
I have previous experience with BeagleBone and other embedded HW. I know Linux kernel internals and have wirtten several modules for various purposes (e.g. SPI and I2C slave devices, SPI master driver). I am able to quickly grasp new concepts.

Contingency
In case of problems I will be consulting AM335x TRM, PRU-ICSS reference guide, and Linux documentation. I will check for solutions and examples in existing kernel code. If I need help, I will try to find it on BeagleBone IRC and ML. I am able to describe my problems in detail and ask specific questions.

Benefit
This project will bring a big benefit to BeagleBoard community (and possibly users of other hardware featuring PRU) as it will enable PRU users to design applications using both PRU cores in cases where one was effectively wasted on copying data back and forth to memory. If some cases this could mean double speed or I/O pins for those applications.

Quote from IRC explaining the project:

Feb 20 22:19:36     all PRU code I am aware of transfer things between the PRU and the main processor by the action of either the A8 or the PRU Feb 20 22:19:50     the onboard DMA controller (EDMA) can do the work. Feb 20 22:20:04     the goal of that project is to figure out how to get the EDMA to do it and write sample code to show it                              doing it    Feb 20 22:20:30      i.e. the beaglelogic code xfers things by using one of the PRUs to xfer data. this burns that PRU (there                              are only 2 per PRUSS) Feb 20 22:20:55     moving it to the EDMA can free it up. Conversely, some setups use the ARM side to suck data from the PRUSS Feb 20 22:21:04     it burns cpu cycles. using EDMA can simplify it. Feb 20 22:21:15     so that's the project in a nuthshell