BeagleBoard/GSoC/2023 Proposal/OpenGLES acceleration for DL Minh Le

=BeagleBoard/GSoC/2023_Proposal/OpenGLES_acceleration_for_DL_Minh_Le = About Student: Minh Le Mentors: Shreyas Atre Wiki: Beagle AI-64 GSoC: TBD

=Status= This project is currently just a proposal.

=Proposal=
 * Completed All the requirements listed on the ideas page.
 * The PR request for cross-compilation task: task

About you
IRC: @whoknow123:matrix.org Github: yourcomrade School: Saxion University of Applied Science Country: Netherland Primary language: English, Vietnamese Typical work hours: 8am-12am and 5pm-12pm Central European Summer Time UTC+01:00 Previous GSoC participation: This is the first time I participate in GSoC

About your project
Project name: OpenGLES acceleration for DL

Overview
The goal of the project is to accelerate many types of layer in neural network by using GPU in BeagleBoard X15/AI-64. Currently, there are 3 ways to program GPU: either by using OpenGL ES, OpenCL or Vulkan. I choose OpenGL ES because it was supported by many GPUs compared to OpenCL and Vulkan. OpenGL ES has a special functionality called compute shader, which allows programmer to use GPU for general computing outside normal GPU using like rendering, drawing and so on. However, compute shader is only supported by OpenGL ES version 3.1 and upward. This accidentally creates barrier for older BeagleBoard AI as well as other BeagleBoard wants to use GPU to accelerate neural network. Therefore, I choose to use fragment shader which is available from OpenGL ES 2.0 and upward.

The framework I choose to implement OpenGL ES as a backend is Darknet framework as it is the simplest neural network I know.

Mission
As I discuss with my mentor, the goal behind this project is to port the tiny-yolov3 model running on the Darknet framework to run on GPU. This tiny-yolov3 model uses GPU for inference, which means it loads pre-trained weight and run on GPU to make a prediction. From the configuration of tiny yolov3, the tiny yolov3 model architecture uses convolution layer, maxpool layer,up-sampling layer, and activation layer. Therefore, I will focus on speeding up the convolution layer, maxpool layer,up-sampling layer, and activation layer. After finishing them, I will start to implement them for the Darknet framework.

Expected result
I expected to run the tiny-yolov3 model on GPU faster than at least 20% run on CPU. Ideally, the time complexity of running on GPU can be half of the time complexity running on CPU

Tasks
There are 2 tasks I have to do:
 * 1) Create a library that uses OpenGL ES to accelerate layers
 * 2) Implement that library to the Darknet framework

Library
The library will be implemented based on the project GPGPU with GLES API of the former GSoC contributor Jakub Duchniewicz. The idea for using OpenGLES for computing using GPU is that we draw off-screen a rectangle and transfer data into the form of 2D texture. Those data are represented in GPU memory as color pixels, and we can perform calculations on those color pixels using fragment shader, then we read them back to our memory.

Implementation
Currently, the Darknet framework relies on CUDA and OpenMP to accelerate computation. Both CUDA and OpenMP are used for inference and training models. However, this project aims to accelerate model inference, not for training, I will add OpenGLES to the Darknet framework to accelerate neural network layers for inference . Here is the example of what my API will look like:

void forward_maxpool_layer_gles(maxpool_layer l, network net); void forward_upsample_layer_gles(const layer l, network_state state); void forward_activation_layer_gles(layer l, network_state state); void forward_convolutional_layer_gles(convolutional_layer layer, network_state state);
 * 1) ifdef GLES
 * 1) endif

Experience and approach
This project requires knowledge and experience in deep learning, graphic programming, parallel computing, and embedded Linux. I have experience with OpenGL, which is an API for graphic programming. I am also learning CUDA to understand more about the implementation of neural networks in GPU. In addition, I have knowledge about digital image processing to understand how neural network process images. For embedded Linux, I have experience with Raspberry Pi, and I have experience using Buildroot to build kernel, file system for embedded Linux. As this is a complex project, I can work up to 35 hours per week for this project.

Contingency
If I get stuck on my project and my mentor isn’t around, I will use the following resources:
 * Forum of beaglebone
 * Reddit
 * Former GSoC contributor Jakub Duchniewicz
 * Beagle AI-64 Documentation

Benefit
The outcome of this project is that users can run the tiny-yolov3 model from the Darknet framework on GPU of BeagleBoard to make predictions. Other BeagleBoards as well as other open-source hardware boards have GPU can also run the tiny-yolov3 model on GPU with few modifications in the library.