BeagleBoard/GSoC/2023 Proposal/OpenGLES acceleration for DL

=Proposal for OpenGLES acceleration for DL=
 * Student: Pratham Deshmukh
 * Code  : darknet
 * Mentors: Shreyas Atre
 * Proposal: OpenGLES acceleration for DL
 * Wiki  : NA
 * GSoC  : Proposal Request

=Status= This project is currently just a proposal.

=Proposal=
 * Completed All the requirements listed on the ideas page.
 * The PR request for cross-compilation task: task

=About you=
 * IRC Nickname: Pratham
 * Github: Pratham Deshmukh
 * College: Veermata Jijabai Technological Institute
 * Country: India
 * Primary language: English, Hindi, Marathi
 * Typical work hours: 9am to 5pm
 * Previous GSoC participation: This is my first time participating in GSoC.

=About your project= Project name: OpenGLES acceleration for DL

Overview
The aim of the project is to accelerate as many layers as possible in neural network by using OpenGLES-enabled GPU in BeagleBoard X15/AI-64. I will be using Shaders to run on the GPU. Shaders are the user-defined program that run on the GPU of the board. The use of shaders for computation can result in significant speedup, as GPUs are designed to process large amounts of data in parallel. Out of various shaders that can be used on the GPU, the shaders I will be using are the Compute Shaders.

Compute Shaders can be used to accelerate the performance of the YOLO model in the Darknet CNN Framework. They can be used to perform parallel-processing, which will eventually help in performing heavy computations in the Deep Learning Algorithm. This will allow multiple calculations to be performed simultaneously by using some features like CUDA and OpenCL. To use Compute Shaders, we will need to identify which type of layers can be accelerated in the YOLO model.

I will be adapting convolution layer in this project. The reason to target this layer is it has the ability to learn and extract hierarchical representations of the input data, such as images. Additionally convolution layer are computationally efficient and can be highly parallelized, making them ideal for acceleration using OpenGLES shaders. By accelerating convolutional layers using compute shaders, we can significantly improve the performance of deep learning models.

Once we have identified the layer types that can be accelerated using compute shaders, we can develop optimized shader programs that perform these computations on the OpenGLES-enabled GPU. These shader programs would need to take into account the specific architecture of the GPU and optimize the computations for maximum parallelism. Next, we would integrate the compute shaders into the Darknet CNN framework, which would require modifying the existing code to support the use of compute shaders for these layer types. We would also need to verify that the implementation is correct and benchmark the performance gains achieved by the compute shader-accelerated layers.

Implementation Details

 * Implementation of this project involves knowledge of Deep Learning, understanding of Neural Networks, YOLO model, Darknet framework, convolution Neural Network and the OpenGLES API.
 * Reason to use YOLOv3 is that it is the fastest object detection algorithms with high detection accuracy. It uses Darknet-53 which has 53 convolution layers making it powerful.
 * Also YOLOv3 is easy to implement and can run on variety of platforms like GPUs. It can detect wide range of objects and can handle Intricate environments.
 * Next step would be to identify the layer for acceleration using OpenGLES shaders. There are various layers that can build Convolutional Neural Networks as mentioned here. As mentioned earlier, I will be targeting convolution layers in this project.
 * The third step is to develop and optimize compute shader programs for the targeted layers. Compute shaders are a type of shader program that can be executed on the GPU. They are highly parallel and can perform computations in parallel on multiple data.
 * Then, I will be integrating the optimized shaders into the YOLOv3 model pipeline using OpenGLES APIs.
 * Finally, I will start by testing and evaluating the performance of the accelerated YOLO model. The performance of the model can be evaluated based on its accuracy, speed, and memory usage. Comparing the performance of the accelerated model with the original model can help determine the effectiveness of the optimization techniques used.

Experience and approach
This project requires knowledge in Neural Networks, convolution, C/C++, Linux kernel and OpenGLES.
 * I have Previously Worked on the GPGPU-WITH-GLES project. Hence, I have good understanding of OpenGLES APIs, Shaders and Linux Kernels.
 * I am well-worsed with different types of GPU-capable shaders and I am aware of which of them would be suitable for this project.
 * I have been exploring Neural Networks and Convolutions and have gained sufficient knowledge to start the implementation.
 * I also have beaglebone(pocket beagle) and have tried implementing the darknet framework on it.
 * I am passionate Open Source enthusiast and I will do the work wholeheartedly. I have my commitment to GSoC and I would do everything in my power to finish the project idea within the allotted time.
 * I will keep contributing to the project after GSoC and will be interacting with the community often.

Contingency
If I get through any contingencies, I will refer the following resources:
 * I Have list of resources available online. So if I get stuck I will refer those resources.
 * I will use Beagle Slack to communicate with other mentors.

Benefit

 * The Performance of the YOLOv3 model is improved which will lead to better object detection.
 * Many layers can be accelerated at a time hence the efficiency of the model is improved.
 * Memory Usage is reduced by loading the computations on GPU as discussed here.