BeagleBoard/GSoC/2023 Proposal/Pratham Deshmukh

=[[BeagleBoard/GSoC/2023_Proposal/OpenGLES_acceleration for DL]

=BeagleBoard/GSoC/2023_Proposal/OpenGLES_acceleration for DL = About Student: Mentors: Shreyas Atre Wiki: Beagle AI-64 GSoC: TBD

=Status= This project is currently just a proposal.

=Proposal=
 * Completed All the requirements listed on the ideas page.
 * The PR request for cross-compilation task: task
 * Solved 2 issues in the repository issue_1 issue_2

About you
IRC: @whoknow123:matrix.org Github: College: Veermata Jijabai Technological Institute Country: India Primary language: English, Hindi, Marathi Typical work hours: 9am to 5pm Previous GSoC participation: This is the first time I participate in GSoC

About your project
Project name: OpenGLES acceleration for DL

Overview
The aim of the project is to accelerate as many layers as possible in neural network by using OpenGLES-enabled GPU in BeagleBoard X15/AI-64. I will be using Shaders to run on the GPU. Shaders are the user-defined program that run on the GPU of the board. The use of shaders for computation can result in significant speedup, as GPUs are designed to process large amounts of data in parallel. Out of various shaders that can be used on the GPU, the shaders I will be using are Compute Shaders.

Compute Shaders can be used to accelerate the performance of the YOLO model in the Darknet CNN Framework. They can be used to perform parallel-processing, which will eventually help in performing heavy computations in the Deep Learning Algorithm. This will allow multiple calculations to be performed simultaneously by using some features like CUDA and OpenCL. To use Compute Shaders, we will need to identify which type of layers can be accelerated in the YOLO model.

I will be adapting convolution layer in this project. The reason to target this layer is it has the ability to learn and extract hierarchical representations of the input data, such as images. Additionally convolution layer are computationally efficient and can be highly parallelized, making them ideal for acceleration using OpenGLES shaders. By accelerating convolutional layers using compute shaders, we can significantly improve the performance of deep learning models.

Once we have identified the layer types that can be accelerated using compute shaders, we can develop optimized shader programs that perform these computations on the OpenGLES-enabled GPU. These shader programs would need to take into account the specific architecture of the GPU and optimize the computations for maximum parallelism. Next, we would integrate the compute shaders into the Darknet CNN framework, which would require modifying the existing code to support the use of compute shaders for these layer types. We would also need to verify that the implementation is correct and benchmark the performance gains achieved by the compute shader-accelerated layers.

Mission
As I discuss with my mentor, the goal behind this project is to port the tiny-yolov3 model running on the Darknet framework to run on GPU. This tiny-yolov3 model uses GPU for inference, which means it loads pre-trained weight and run on GPU to make a prediction. From the configuration of tiny yolov3, the tiny yolov3 model architecture uses convolution layer,shaders maxpool layer,up-sampling layer, and activation layer. Therefore, I will focus on speeding up the convolution layer, maxpool layer,up-sampling layer, and activation layer. After finishing them, I will start to implement them for the Darknet framework.

Expected result
I expected to run the tiny-yolov3 model on GPU faster than at least 20% run on CPU. Ideally, the time complexity of running on GPU can be half of the time complexity running on CPU

Tasks
There are 2 tasks I have to do:
 * 1) Create a library that uses OpenGL ES to accelerate layers
 * 2) Implement that library to the Darknet framework

Library
The library will be implemented based on the project GPGPU with GLES API of the former GSoC contributor Jakub Duchniewicz. The idea for using OpenGLES for computing using GPU is that we draw off-screen a rectangle and transfer data into the form of 2D texture. Those data are represented in GPU memory as color pixels, and we can perform calculations on those color pixels using fragment shader, then we read them back to our memory.

Implementation
Currently, the Darknet framework relies on CUDA and OpenMP to accelerate computation. Both CUDA and OpenMP are used for inference and training models. However, this project aims to accelerate model inference, not for training, I will add OpenGLES to the Darknet framework to accelerate neural network layers for inference . Here is the example of what my API will look like:

void forward_maxpool_layer_gles(maxpool_layer l, network net); void forward_upsample_layer_gles(const layer l, network_state state); void forward_activation_layer_gles(layer l, network_state state); void forward_convolutional_layer_gles(convolutional_layer layer, network_state state);
 * 1) ifdef GLES
 * 1) endif

Experience and approach
This project requires knowledge and experience in deep learning, graphic programming, parallel computing, and embedded Linux. I have experience with OpenGL, which is an API for graphic programming. I am also learning CUDA to understand more about the implementation of neural networks in GPU. In addition, I have knowledge about digital image processing to understand how neural network process images. For embedded Linux, I have experience with Raspberry Pi, and I have experience using Buildroot to build kernel, file system for embedded Linux. As this is a complex project, I can work up to 35 hours per week for this project.

Contingency
If I get stuck on my project and my mentor isn’t around, I will use the following resources:
 * Forum of beaglebone
 * Reddit
 * Former GSoC contributor Jakub Duchniewicz
 * Beagle AI-64 Documentation

Benefit
The outcome of this project is that users can run the tiny-yolov3 model from the Darknet framework on GPU of BeagleBoard to make predictions. Other BeagleBoards as well as other open-source hardware boards have GPU can also run the tiny-yolov3 model on GPU with few modifications in the library.