Difference between revisions of "BeagleBoard/GSoC/2023 Proposal/OpenGLES acceleration for DL Minh Le"

From eLinux.org
Jump to: navigation, search
 
Line 34: Line 34:
 
I choose OpenGL ES because it was supported by many GPUs compared to OpenCL and Vulkan. OpenGL ES has a special functionality called '''compute shader''', which allows programmer to use GPU for general computing
 
I choose OpenGL ES because it was supported by many GPUs compared to OpenCL and Vulkan. OpenGL ES has a special functionality called '''compute shader''', which allows programmer to use GPU for general computing
 
outside normal GPU using like rendering, drawing and so on. However, '''compute shader''' is only supported by OpenGL ES version 3.1 and upward. This accidentally creates barrier for older BeagleBoard AI as well as other BeagleBoard  
 
outside normal GPU using like rendering, drawing and so on. However, '''compute shader''' is only supported by OpenGL ES version 3.1 and upward. This accidentally creates barrier for older BeagleBoard AI as well as other BeagleBoard  
wants to use GPU to accelerate neural network. Therefore, I choose to use '''fragment shader''' which is available from OpenGL ES 2.0 and upward.   
+
wants to use GPU to accelerate neural networks for inference. Therefore, I choose to use '''fragment shader''' which is available from OpenGL ES 2.0 and upward.   
  
The framework I choose to implement OpenGL ES as a backend is [https://github.com/pjreddie/darknet/pulls Darknet framework] as it is the simplest neural network I know.
+
The framework I choose to implement OpenGL ES as a backend is [https://github.com/pjreddie/darknet/pulls Darknet framework] as it is the smallest and simplest framework for neural networks I know. Darknet is the open-source neural network framework written in C and CUDA. Darknet is easy to install, fast to train and run neural networks in both CPU and GPU. Darknet is used
 +
to implement many variations of yolo models.
  
  
Line 46: Line 47:
 
maxpool layer,up-sampling layer, and activation layer. After finishing them, I will start to implement them for the Darknet framework.
 
maxpool layer,up-sampling layer, and activation layer. After finishing them, I will start to implement them for the Darknet framework.
  
====Expected result====
+
====Expected performance====
 
I expected to run the tiny-yolov3 model on GPU faster than at least 20% run on CPU. Ideally,  
 
I expected to run the tiny-yolov3 model on GPU faster than at least 20% run on CPU. Ideally,  
 
[https://blog.tensorflow.org/2020/08/faster-mobile-gpu-inference-with-opencl.html the time complexity of running on GPU can be half of the time complexity running on CPU]
 
[https://blog.tensorflow.org/2020/08/faster-mobile-gpu-inference-with-opencl.html the time complexity of running on GPU can be half of the time complexity running on CPU]
Line 60: Line 61:
 
color pixels, and we can perform calculations on those color pixels using '''fragment shader''', then we read them back to our memory.
 
color pixels, and we can perform calculations on those color pixels using '''fragment shader''', then we read them back to our memory.
  
=====Implementation=====
+
======Detail implementation of the library======
 +
Each neural network layer in the library will be implemented by using GLSL to speed up the performance. 
 +
# Activation layer: It is a fundamental layer that every neural network must have. The activation is a mathematical function to calculates a weighted total and then adds bias to it to decide whether a neuron should be activated or not. In the tiny-yolov3 models configuration file, the activation functions are '''leaky-relu''' function and  '''linear''' function. As activation layer functions are easy to implement, they will be implemented first. The input tensor is passed as a texture to the shader, and the activation function is applied element-wise to the input values. The resulting values are stored in the output texture, which represents the output of the activation layer.
 +
# Maxpool layer: The max pool layer is a layer that conducts pooling operations. The pooling operation is calculates the maximum value for patches of a feature map, and uses it to create a downsampled feature map. The input feature map is passed as a texture to the shader, and the pooling operation is performed by sampling local neighborhoods of the input texture and taking the maximum value. The resulting values are stored in the output texture, which represents the downsampled feature map of the max pooling layer.
 +
# Up-sampling layer: Upsampling layer is a layer in a neural network that increases the spatial resolution of an input feature map. There are a few methods to implement this operation. Based on [https://stackoverflow.com/questions/60333349/what-is-the-upsampling-technique-used-in-yolov3-upsampling-layers-no-resources the implementation of up-sampling layer] in Darknet framework, the naive method will be used to implement up-sampling layer. The result can be retrieved from the output texture.
 +
# Convolution layer: It is the key layer in many neural networks for image recognition. Convolutional layers are designed to extract local features from the input data, such as edges, textures, and other patterns. The input image is passed as a texture to the shader, and the convolution operation is performed using a set of convolutional kernels. Each kernel is represented as a 2D texture, and the convolution operation is performed by sampling the input texture with the kernel texture and computing the dot product of the sampled values. The resulting values are accumulated into the output texture, which represents the output feature map of the convolutional layer.
 +
 
 +
=====Implementation in Darknet framework=====
 
Currently, the Darknet framework relies on CUDA and OpenMP to accelerate computation. Both CUDA and OpenMP are used for inference and training models.
 
Currently, the Darknet framework relies on CUDA and OpenMP to accelerate computation. Both CUDA and OpenMP are used for inference and training models.
 
However, this project aims to accelerate model inference, not for training,
 
However, this project aims to accelerate model inference, not for training,
 
I will add OpenGLES to the Darknet framework to accelerate neural network layers for inference
 
I will add OpenGLES to the Darknet framework to accelerate neural network layers for inference
. Here is the example of what my API will look like:
+
. Here is an example of what my API may look like in the Darknet framework:
 
 
 
 
 
<pre>
 
<pre>
 
#ifdef GLES
 
#ifdef GLES
Line 75: Line 81:
 
#endif
 
#endif
 
</pre>
 
</pre>
 +
These API layers will be used by the Darknet neural network when it will start to create and connect layers to build models. 
 +
Because of model inference, they will load the weight of the layer, and image input into the fragment shaders, which are part of the OpenGL ES acceleration library, to compute for each layer and then return the result back to the memory. The result will either become another input for the next layer or be used to make a prediction depending on the state of the neural network.
  
 
+
 +
====Deliverible results====
 +
At the end of the project, I will deliver these outcomes:
 +
# The library that uses OpenGL ES to accelerate layers
 +
# Documentations about the library
 +
# Blog post about this project
 +
# Darknet framework with OpenGL ES backend for tiny-yolov3
 +
# A comparison chart between the performance on both CPU and GPU
 
===Timeline===
 
===Timeline===
 
{| class="wikitable"
 
{| class="wikitable"
 +
|-
 +
! Date !!  Status !! Result
 +
|-
 
| March 20 || Applications open, Students register with GSoC, work on a proposal with mentors||
 
| March 20 || Applications open, Students register with GSoC, work on a proposal with mentors||
 
|-
 
|-
Line 87: Line 105:
 
| May 29 || Coding officially begins!
 
| May 29 || Coding officially begins!
 
|-
 
|-
| Jun 10 || Milestone #1|| Introductory YouTube video, setup hardware.  
+
| Jun 10 || Milestone #1|| Introductory YouTube video, create an introductory blog post, setup hardware.  
 +
|-
 +
| Jun 20 || Milestone #2|| Finish the implementation of the initialization and activation layer for OpenGL ES acceleration library
 +
|-
 +
| Jun 27|| Milestone #3||Finish the implementation of maxpool layer for the OpenGL ES acceleration library
 +
|-
 +
| Jun 28 - 30||College examination||Pause the implementation temporarily
 +
|-
 +
| July 6|| Milestone #4|| Finish the implementation of up-sampling layer for the OpenGL ES acceleration library, start to write documentation of the library and implement convolution layer
 
|-
 
|-
| July 10 || Milestone #2|| Finish the implementation of the OpenGL ES acceleration library  for three layers: convolution, maxpool and up-sampling, and begin submitting Phase 1 evaluations
+
| July 10|| Begin submitting Phase 1 evaluations
 
|-
 
|-
| July 14 18:00 UTC || Milestone #3||Phase 1 Evaluation deadline
+
| July 14 18:00 UTC || Milestone #5||Phase 1 Evaluation deadline
 
|-
 
|-
| July 30 || Milestone #3|| Finish adding implementation for activation layer for OpenGL ES acceleration library
+
| July 16|| Milestone #6|| Finish adding implementation for convolution layer for OpenGL ES acceleration library, start to add OpenGL ES backend for Darknet
 
|-
 
|-
| August 20 || Milestone #4|| Add OpenGL ES backend for Darknet, test and implement successfully tiny-yolov3 model to run on GPU
+
| August 20 || Milestone #7|| Test and implement successfully tiny-yolov3 model to run on GPU
 
|-
 
|-
 
| August 21 - 28 18:00 UTC || Final week: Students submit their final work product and their final mentor evaluation, complete a video demonstration
 
| August 21 - 28 18:00 UTC || Final week: Students submit their final work product and their final mentor evaluation, complete a video demonstration
 
|-
 
|-
| August 28 - Sep 4 18:00 UTC || Mentors submit final student evaluations
+
| August 28 - Sep 4 18:00 UTC || Mentors submit final student evaluations|| Finish documentation
 
|}
 
|}
  
Line 107: Line 133:
 
I have experience with OpenGL, which is an API for graphic programming. I am also learning CUDA to understand more about the implementation of neural networks in GPU.
 
I have experience with OpenGL, which is an API for graphic programming. I am also learning CUDA to understand more about the implementation of neural networks in GPU.
 
In addition, I have knowledge about digital image processing to understand how neural network process images.
 
In addition, I have knowledge about digital image processing to understand how neural network process images.
For embedded Linux, I have experience with Raspberry Pi, and I have experience using Buildroot to build kernel, file system for embedded Linux.
+
For embedded Linux, I have experience with Raspberry Pi, and I have experience using Buildroot to build kernel, file systems for embedded Linux.
As this is a complex project, I can work up to 35 hours per week for this project.
+
As this is a complex project, I can work up to 35 hours per week for this project. Moreover, I am a hard-core open source enthusiast, and I will continue to contribute to this project to accelerate other layers after GSoC.
  
 
===Contingency===
 
===Contingency===
If I get stuck on my project and my mentor isn’t around, I will use the following resources:
+
If I get stuck on my project and my mentor is not around, I will use the following resources:
 
* Forum of beaglebone
 
* Forum of beaglebone
 +
* StackOverflow
 +
* OpenGL forum
 
* Reddit
 
* Reddit
 
* Former GSoC contributor Jakub Duchniewicz
 
* Former GSoC contributor Jakub Duchniewicz
Line 118: Line 146:
  
 
===Benefit===
 
===Benefit===
The outcome of this project is that users can run the tiny-yolov3 model from the Darknet framework on GPU of BeagleBoard to make predictions. Other BeagleBoards as well as other open-source hardware boards have GPU can also run the tiny-yolov3 model on GPU with few modifications in the library.
+
The outcome of this project is that users can run the tiny-yolov3 model from the Darknet framework on GPU of BeagleBoard AI to make predictions. Other BeagleBoards as well as other open-source hardware boards have GPU can also run the tiny-yolov3 model on GPU with few modifications in the library.
 +
The performance of tiny-yolov3 inference in GPU will be better than in CPU.

Latest revision as of 17:19, 2 April 2023


BeagleBoard/GSoC/2023_Proposal/OpenGLES_acceleration_for_DL_Minh_Le

About Student: Minh Le
Mentors: Shreyas Atre
Wiki: Beagle AI-64
GSoC: TBD

Status

This project is currently just a proposal.

Proposal

  • Completed All the requirements listed on the ideas page.
  • The PR request for cross-compilation task: task

About you

IRC: @whoknow123:matrix.org
Github: yourcomrade
School: Saxion University of Applied Science
Country: Netherland
Primary language: English, Vietnamese
Typical work hours: 8am-12am and 5pm-12pm Central European Summer Time UTC+01:00
Previous GSoC participation: This is the first time I participate in GSoC

About your project

Project name: OpenGLES acceleration for DL

Description

Overview

The goal of the project is to accelerate many types of layer in neural network by using GPU in BeagleBoard X15/AI-64. Currently, there are 3 ways to program GPU: either by using OpenGL ES, OpenCL or Vulkan. I choose OpenGL ES because it was supported by many GPUs compared to OpenCL and Vulkan. OpenGL ES has a special functionality called compute shader, which allows programmer to use GPU for general computing outside normal GPU using like rendering, drawing and so on. However, compute shader is only supported by OpenGL ES version 3.1 and upward. This accidentally creates barrier for older BeagleBoard AI as well as other BeagleBoard wants to use GPU to accelerate neural networks for inference. Therefore, I choose to use fragment shader which is available from OpenGL ES 2.0 and upward.

The framework I choose to implement OpenGL ES as a backend is Darknet framework as it is the smallest and simplest framework for neural networks I know. Darknet is the open-source neural network framework written in C and CUDA. Darknet is easy to install, fast to train and run neural networks in both CPU and GPU. Darknet is used to implement many variations of yolo models.


Mission

As I discuss with my mentor, the goal behind this project is to port the tiny-yolov3 model running on the Darknet framework to run on GPU. This tiny-yolov3 model uses GPU for inference, which means it loads pre-trained weight and run on GPU to make a prediction. From the configuration of tiny yolov3, the tiny yolov3 model architecture uses convolution layer, maxpool layer,up-sampling layer, and activation layer. Therefore, I will focus on speeding up the convolution layer, maxpool layer,up-sampling layer, and activation layer. After finishing them, I will start to implement them for the Darknet framework.

Expected performance

I expected to run the tiny-yolov3 model on GPU faster than at least 20% run on CPU. Ideally, the time complexity of running on GPU can be half of the time complexity running on CPU

Tasks

There are 2 tasks I have to do:

  1. Create a library that uses OpenGL ES to accelerate layers
  2. Implement that library to the Darknet framework
Library

The library will be implemented based on the project GPGPU with GLES API of the former GSoC contributor Jakub Duchniewicz. The idea for using OpenGLES for computing using GPU is that we draw off-screen a rectangle and transfer data into the form of 2D texture. Those data are represented in GPU memory as color pixels, and we can perform calculations on those color pixels using fragment shader, then we read them back to our memory.

Detail implementation of the library

Each neural network layer in the library will be implemented by using GLSL to speed up the performance.

  1. Activation layer: It is a fundamental layer that every neural network must have. The activation is a mathematical function to calculates a weighted total and then adds bias to it to decide whether a neuron should be activated or not. In the tiny-yolov3 models configuration file, the activation functions are leaky-relu function and linear function. As activation layer functions are easy to implement, they will be implemented first. The input tensor is passed as a texture to the shader, and the activation function is applied element-wise to the input values. The resulting values are stored in the output texture, which represents the output of the activation layer.
  2. Maxpool layer: The max pool layer is a layer that conducts pooling operations. The pooling operation is calculates the maximum value for patches of a feature map, and uses it to create a downsampled feature map. The input feature map is passed as a texture to the shader, and the pooling operation is performed by sampling local neighborhoods of the input texture and taking the maximum value. The resulting values are stored in the output texture, which represents the downsampled feature map of the max pooling layer.
  3. Up-sampling layer: Upsampling layer is a layer in a neural network that increases the spatial resolution of an input feature map. There are a few methods to implement this operation. Based on the implementation of up-sampling layer in Darknet framework, the naive method will be used to implement up-sampling layer. The result can be retrieved from the output texture.
  4. Convolution layer: It is the key layer in many neural networks for image recognition. Convolutional layers are designed to extract local features from the input data, such as edges, textures, and other patterns. The input image is passed as a texture to the shader, and the convolution operation is performed using a set of convolutional kernels. Each kernel is represented as a 2D texture, and the convolution operation is performed by sampling the input texture with the kernel texture and computing the dot product of the sampled values. The resulting values are accumulated into the output texture, which represents the output feature map of the convolutional layer.
Implementation in Darknet framework

Currently, the Darknet framework relies on CUDA and OpenMP to accelerate computation. Both CUDA and OpenMP are used for inference and training models. However, this project aims to accelerate model inference, not for training, I will add OpenGLES to the Darknet framework to accelerate neural network layers for inference . Here is an example of what my API may look like in the Darknet framework:

#ifdef GLES
void forward_maxpool_layer_gles(maxpool_layer l, network net);
void forward_upsample_layer_gles(const layer l, network_state state);
void forward_activation_layer_gles(layer l, network_state state);
void forward_convolutional_layer_gles(convolutional_layer layer, network_state state);
#endif

These API layers will be used by the Darknet neural network when it will start to create and connect layers to build models. Because of model inference, they will load the weight of the layer, and image input into the fragment shaders, which are part of the OpenGL ES acceleration library, to compute for each layer and then return the result back to the memory. The result will either become another input for the next layer or be used to make a prediction depending on the state of the neural network.


Deliverible results

At the end of the project, I will deliver these outcomes:

  1. The library that uses OpenGL ES to accelerate layers
  2. Documentations about the library
  3. Blog post about this project
  4. Darknet framework with OpenGL ES backend for tiny-yolov3
  5. A comparison chart between the performance on both CPU and GPU

Timeline

Date Status Result
March 20 Applications open, Students register with GSoC, work on a proposal with mentors
Apr 2 Proposal complete, Submitted to https://summerofcode.withgoogle.com
May 4 Proposal accepted or rejected
May 29 Coding officially begins!
Jun 10 Milestone #1 Introductory YouTube video, create an introductory blog post, setup hardware.
Jun 20 Milestone #2 Finish the implementation of the initialization and activation layer for OpenGL ES acceleration library
Jun 27 Milestone #3 Finish the implementation of maxpool layer for the OpenGL ES acceleration library
Jun 28 - 30 College examination Pause the implementation temporarily
July 6 Milestone #4 Finish the implementation of up-sampling layer for the OpenGL ES acceleration library, start to write documentation of the library and implement convolution layer
July 10 Begin submitting Phase 1 evaluations
July 14 18:00 UTC Milestone #5 Phase 1 Evaluation deadline
July 16 Milestone #6 Finish adding implementation for convolution layer for OpenGL ES acceleration library, start to add OpenGL ES backend for Darknet
August 20 Milestone #7 Test and implement successfully tiny-yolov3 model to run on GPU
August 21 - 28 18:00 UTC Final week: Students submit their final work product and their final mentor evaluation, complete a video demonstration
August 28 - Sep 4 18:00 UTC Mentors submit final student evaluations Finish documentation


Experience and approach

This project requires knowledge and experience in deep learning, graphic programming, parallel computing, and embedded Linux. I have experience with OpenGL, which is an API for graphic programming. I am also learning CUDA to understand more about the implementation of neural networks in GPU. In addition, I have knowledge about digital image processing to understand how neural network process images. For embedded Linux, I have experience with Raspberry Pi, and I have experience using Buildroot to build kernel, file systems for embedded Linux. As this is a complex project, I can work up to 35 hours per week for this project. Moreover, I am a hard-core open source enthusiast, and I will continue to contribute to this project to accelerate other layers after GSoC.

Contingency

If I get stuck on my project and my mentor is not around, I will use the following resources:

Benefit

The outcome of this project is that users can run the tiny-yolov3 model from the Darknet framework on GPU of BeagleBoard AI to make predictions. Other BeagleBoards as well as other open-source hardware boards have GPU can also run the tiny-yolov3 model on GPU with few modifications in the library. The performance of tiny-yolov3 inference in GPU will be better than in CPU.