BeagleBoard/GSoC/2023 Proposal/OpenGLES acceleration for DL

=Proposal for OpenGLES acceleration for DL=
 * Student: Pratham Deshmukh
 * Code  : darknet
 * Mentors: Shreyas Atre
 * Proposal: OpenGLES acceleration for DL
 * Wiki  : NA
 * GSoC  : Proposal Request

=Status= This project is currently just a proposal.

=Proposal=
 * Completed All the requirements listed on the ideas page.
 * The PR request for cross-compilation task.

=About you=
 * IRC Nickname: Pratham
 * Github: Pratham Deshmukh
 * College: Veermata Jijabai Technological Institute
 * Country: India
 * Primary language: English, Hindi, Marathi
 * Typical work hours: 9am to 5pm
 * Previous GSoC participation: This is my first time participating in GSoC.

=About your project= Project name: OpenGLES acceleration for DL

Overview
Deep Learning is a subset of Machine Learning which involves use of Neural Network with multiple Layers. Neural networks consist of multiple layers of interconnected nodes, each building upon the previous layer to refine and optimise the prediction.

The main goal of the project is to accelerate as many layers types as possible using OpenGLES and Darknet as Deep Learning framemork.

Shaders
Shaders are the user-defined program that run on the GPU of the board. The use of shaders for computation can result in significant speedup, as GPUs are designed to process large amounts of data in parallel. Out of various shaders that can be used on the GPU, the shaders I will be using are the Compute Shaders.They can be used to perform parallel computations, such as matrix multiplication and convolution, which are often used in deep learning applications. Compute shaders can be written using the GLSL programming language, and can be executed on the GPU using the glDispatchCompute function in the OpenGL API.

Darknet
Darknet is an open source neural network framework written in C and CUDA. It is fast,compatible, easy to install, and supports CPU and GPU computation. Darknet is used in the project to implement the YOLO object detection and recognition model.

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection algorithm that uses a single neural network to detect objects. The YOLO model consists of multiple convolutional layers that extract features from the input image and several fully connected layers that produce the output of the model. These layers have many parameters that need to be optimised during training to achieve high accuracy in object detection and recognition.

In this project, Darknet is used as the deep learning framework to implement the YOLO model and optimise its performance.

Pipeline
Out of the various YOLO pipelines(YOLO,YOLOv2,YOLOv3,etc), I will be adapting YOLOv3 in this project. YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. Moreover, you can easily trade off between speed and accuracy simply by changing the size of the model.

The YOLOv3 model consist of various layers such as Convolution layers, Route layer, Up-Sampling layers, Region layer, Maxpool layer etc. Thus we will be performing computations on these layers to accelerate the performance of the YOLOv3 model. To accelerate the performance of the YOLOv3 model, we will utilize the OpenGLES-enabled GPU on the target hardware platform. The GPU can be used to perform the computations required by certain layers in the neural network using parallel processing, which can greatly speed up the processing time.

1. Identifying the layer that can benefit from GPU acceleration
Convolution layer:

It is a fundamental building block in deep neural networks. The convolution operation involves sliding a filter or kernel over an input image, computing dot products between the filter and local patches of the image to produce a feature map.The convolution layer is used extensively in the backbone network to extract high-level features from the input image. By adapting the convolution layer for acceleration using OpenGLES shaders, we can significantly speed up the computation time and improve the overall performance of the YOLOv3 model on resource-constrained devices.

Route layer:

The route layer can also be used in the implementation to accelerate the YOLOv3 pipeline using OpenGLES. The route layer is used to concatenate feature maps from different layers. It can concatenate two or more feature maps along the channel dimension. By doing so, it enables the network to combine features learned from different layers and extract more complex features.

Up-Sampling layer:

Upsampling layers can be used in the YOLO pipeline to increase the resolution of the feature maps before passing them to subsequent layers. Upsampling can be implemented using various techniques such as bilinear or nearest-neighbor interpolation, or transposed convolution.

Region layer:

The region layer is an important layer in the YOLOv3 model that is responsible for predicting the object bounding boxes and associated class probabilities.

Maxpool layer:

The maxpool layer can be used in the YOLO pipeline to downsample the feature maps and reduce their spatial resolution. The maxpool layer can be used to extract the most important features from each local region of the input feature map and reduce its size, thus reducing the computational cost of subsequent layers.

2. Writing the shader code using the OpenGLES API to perform the computations required by the selected layers on the GPU.
The shader code will need to be optimized for parallel processing Here is an example of shader code for a convolution operation using the OpenGLES API:

3. Integrate the shader code into the Darknet CNN framework, which is used to build the YOLOv3 model.
This may involve modifying the existing Darknet code to support the OpenGLES API calls.Integrating the shader code into the Darknet CNN framework involves modifying the existing codebase to support the OpenGLES API calls. The modified code allow for the execution of the selected layers on the GPU using the optimized shader code.The goal of integrating the shader code into the Darknet CNN framework is to allow for the efficient execution of the selected layers on the GPU, resulting in faster and more accurate object detection using the YOLOv3 model.

4. Compile and build the modified Darknet code with the integrated OpenGLES shaders
1. Installing Dependencies such as CUDA, OpenCV, etc. 2. Modifying and building the darknet code which involve adding code to the existing darknet file or will be creating new file. 3. Test and deploy the modified code.

Timeline
=Experience and approach= This project requires knowledge in Neural Networks, convolution, C/C++, Linux kernel and OpenGLES.
 * I have Previously Worked on the GPGPU-WITH-GLES project. Hence, I have good understanding of OpenGLES APIs, Shaders and Linux Kernels.
 * I am well-worsed with different types of GPU-capable shaders and I am aware of which of them would be suitable for this project.
 * I have also performed Operations such as Matrix Mulltiplication and transpose of a Matrix.
 * I have been exploring Neural Networks and Convolutions and have gained sufficient knowledge to start the implementation.
 * I also have beaglebone(pocket beagle) and have tried implementing the darknet framework on it.
 * I am passionate Open Source enthusiast and I will do the work wholeheartedly. I have my commitment to GSoC and I would do everything in my power to finish the project idea within the allotted time.
 * I will keep contributing to the project after GSoC and will be interacting with the community often.

=Contingency= If I get through any contingencies, I will refer the following resources:
 * I Have list of resources available online. So if I get stuck I will refer those resources.
 * I will use Beagle Slack to communicate with other mentors.

=Benefit=
 * The Performance of the YOLOv3 model is improved which will lead to better object detection.
 * Many layers can be accelerated at a time hence the efficiency of the model is improved.
 * Memory Usage is reduced by loading the computations on GPU as discussed here.