BeagleBoard/GSoC/2023 Proposal/OpenGLES acceleration for DL

From eLinux.org
Jump to: navigation, search

Proposal for OpenGLES acceleration for DL

Status

This project is currently just a proposal.

Proposal

  • Completed All the requirements listed on the ideas page.
  • The PR request for cross-compilation task.

About you

About your project

Project name: OpenGLES acceleration for DL

Description

Overview

Deep Learning is a subset of Machine Learning which involves use of Neural Network with multiple Layers. Neural networks consist of multiple layers of interconnected nodes, each building upon the previous layer to refine and optimise the prediction.

The main goal of the project is to accelerate as many layers types as possible using OpenGLES and Darknet as Deep Learning framemork.

Shaders

Shaders are the user-defined program that run on the GPU of the board. The use of shaders for computation can result in significant speedup, as GPUs are designed to process large amounts of data in parallel. Out of various shaders that can be used on the GPU, the shaders I will be using are the Compute Shaders.

Darknet

Darknet is an open source neural network framework written in C and CUDA. It is fast,compatible, easy to install, and supports CPU and GPU computation. Darknet is used in the project to implement the YOLO object detection and recognition model.

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection algorithm that uses a single neural network to detect objects. The YOLO model consists of multiple convolutional layers that extract features from the input image and several fully connected layers that produce the output of the model. These layers have many parameters that need to be optimised during training to achieve high accuracy in object detection and recognition.

In this project, Darknet is used as the deep learning framework to implement the YOLO model and optimise its performance.

Pipeline

Out of the various YOLO pipelines(YOLO,YOLOv2,YOLOv3,etc), I will be adapting YOLOv3 in this project. YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. Moreover, you can easily trade off between speed and accuracy simply by changing the size of the model.

The YOLOv3 model consist of various layers such as Convolution layers, Route layer, Up-Sampling layers, Region layer, Maxpool layer etc. Thus we will be performing computations on these layers to accelerate the performance of the YOLOv3 model. To accelerate the performance of the YOLOv3 model, we will utilize the OpenGLES-enabled GPU on the target hardware platform. The GPU can be used to perform the computations required by certain layers in the neural network using parallel processing, which can greatly speed up the processing time.

Implementation Details

1. Identifying the layer that can benefit from GPU acceleration

Convolution layer:

It is a fundamental building block in deep neural networks. The convolution operation involves sliding a filter or kernel over an input image, computing dot products between the filter and local patches of the image to produce a feature map.The convolution layer is used extensively in the backbone network to extract high-level features from the input image. By adapting the convolution layer for acceleration using OpenGLES shaders, we can significantly speed up the computation time and improve the overall performance of the YOLOv3 model on resource-constrained devices.

Route layer:

The route layer can also be used in the implementation to accelerate the YOLOv3 pipeline using OpenGLES. The route layer is used to concatenate feature maps from different layers. It can concatenate two or more feature maps along the channel dimension. By doing so, it enables the network to combine features learned from different layers and extract more complex features.

Up-Sampling layer:

Upsampling layers can be used in the YOLO pipeline to increase the resolution of the feature maps before passing them to subsequent layers. Upsampling can be implemented using various techniques such as bilinear or nearest-neighbor interpolation, or transposed convolution.

Region layer:

The region layer is an important layer in the YOLOv3 model that is responsible for predicting the object bounding boxes and associated class probabilities.

Maxpool layer:

The maxpool layer can be used in the YOLO pipeline to downsample the feature maps and reduce their spatial resolution. The maxpool layer can be used to extract the most important features from each local region of the input feature map and reduce its size, thus reducing the computational cost of subsequent layers.


2. Writing the shader code using the OpenGLES API to perform the computations required by the selected layers on the GPU.

The shader code will need to be optimized for parallel processing Here is an example of shader code for a convolution operation using the OpenGLES API:

 1 uniform float uKernel[9];
 2 uniform sampler2D uSampler;
 3 uniform vec2 uTextureSize;
 4  
 5 varying vec2 vTexCoord;
 6  
 7 void main(void)
 8 {
 9     vec4 sum = vec4(0.0);
10     vec2 stepSize = 1.0/(uTextureSize);
11  
12     sum += texture2D(uSampler, vec2(vTexCoord.x - stepSize.x, vTexCoord.y - stepSize.y))
13             * uKernel[0];
14     sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y - stepSize.y))
15             * uKernel[1];
16     sum += texture2D(uSampler, vec2(vTexCoord.x + stepSize.x, vTexCoord.y - stepSize.y))
17             * uKernel[2];
18  
19     sum += texture2D(uSampler, vec2(vTexCoord.x - stepSize.x, vTexCoord.y))
20             * uKernel[3];
21     sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y))
22             * uKernel[4];
23     sum += texture2D(uSampler, vec2(vTexCoord.x + stepSize.x, vTexCoord.y))
24             * uKernel[5];
25  
26     sum += texture2D(uSampler, vec2(vTexCoord.x - stepSize.x, vTexCoord.y + stepSize.y))
27             * uKernel[6];
28     sum += texture2D(uSampler, vec2(vTexCoord.x, vTexCoord.y + stepSize.y))
29             * uKernel[7];
30     sum += texture2D(uSampler, vec2(vTexCoord.x + stepSize.x, vTexCoord.y + stepSize.y))
31             * uKernel[8];
32  
33     sum.a = 1.0;
34  
35     gl_FragColor = sum;
36 }


3. Integrate the shader code into the Darknet CNN framework, which is used to build the YOLOv3 model.

This may involve modifying the existing Darknet code to support the OpenGLES API calls.Integrating the shader code into the Darknet CNN framework involves modifying the existing codebase to support the OpenGLES API calls. The modified code allow for the execution of the selected layers on the GPU using the optimized shader code.The goal of integrating the shader code into the Darknet CNN framework is to allow for the efficient execution of the selected layers on the GPU, resulting in faster and more accurate object detection using the YOLOv3 model.


4. Compile and build the modified Darknet code with the integrated OpenGLES shaders

1. Installing Dependencies such as CUDA, OpenCV, etc.
2. Modifying and building the darknet code which involve adding code to the existing darknet file or will be creating new file.
3. Test and deploy the modified code.

5. Test the performance of the modified YOLOv3 model with and without GPU acceleration to measure the speed-up achieved by the GPU acceleration.

Timeline

Date Status Details
Apr 4 Application Deadline
  • Submitting Proposal to the mentors
  • Building the concept of Convolution Neural network
  • Understanding darknet interface
Apr 4 - May 4 Selection Phase
  • I would be catching up with the community, getting familiar with work culture.
  • Familiarize with the X15/AI board, the OpenGLES GPU, and the Darknet CNN framework.
May 4 - May 10 GSoC Acceptance
  • Community Bonding and discussing implementation details with mentors.
  • Getting all doubts cleared regarding the project.
  • Getting familiar with the work culture.
May 10 - May 31 College Exams
  • There are college exams during this period so i will focus on exams.
June 1 - June 13 Milestone #1
  • Introductory YouTube video
  • Develop Conceptual knowledge
  • Identify layers that can be accelerated using OpenGLES and start learning about bench-marking.
June 14th - June 25th Coding Starts
  • Optimising the shader code to improve performance.
  • Implementing the code base and getting thorough with it.
  • Improving the implementation efficiency.
July 26th - August 5th Milestone #2
  • Benchmarking and verifying the Implementation.
  • Verifying and testing the obtained results
August 6th - August 15th Phase 1 Submission
  • Starting with the documentation and submit the work product to the mentors
August 15th - August 30th
  • Present the project to the mentors and receive feedback.
  • Work on the Feedback received and make necessary changes.
August 30st- Sept 15 Final Submission
  • Completing the documentation and summarise the whole project.
  • Submit the Final Work and Final Mentor Evaluation

Experience and approach

This project requires knowledge in Neural Networks, convolution, C/C++, Linux kernel and OpenGLES.

  • I have Previously Worked on the GPGPU-WITH-GLES project. Hence, I have good understanding of OpenGLES APIs, Shaders and Linux Kernels.
  • I am well-worsed with different types of GPU-capable shaders and I am aware of which of them would be suitable for this project.
  • I have also performed Operations such as Matrix Mulltiplication and transpose of a Matrix.
  • I have been exploring Neural Networks and Convolutions and have gained sufficient knowledge to start the implementation.
  • I also have beaglebone(pocket beagle) and have tried implementing the darknet framework on it.
  • I am passionate Open Source enthusiast and I will do the work wholeheartedly. I have my commitment to GSoC and I would do everything in my power to finish the project idea within the allotted time.
  • I will keep contributing to the project after GSoC and will be interacting with the community often.

Contingency

If I get through any contingencies, I will refer the following resources:

  • I Have list of resources available online. So if I get stuck I will refer those resources.
  • I will use Beagle Slack to communicate with other mentors.

Benefit

  • The Performance of the YOLOv3 model is improved which will lead to better object detection.
  • Many layers can be accelerated at a time hence the efficiency of the model is improved.
  • Memory Usage is reduced by loading the computations on GPU as discussed here.