Beagleboard:GSoC 2018 Proposal:Beaglebone GPU Offload

=Proposal for Beaglebone GPU offload=

This project aims to use the GPU capabilities present on the Beagleboard.

Name : Sidharth Mohla Student: UserSidharth Mentors: Hunyue Yau, Robert Manzke Wiki: https://elinux.org/BeagleBoard/GSoC/Ideas#BeagleBone_GPU_offload

=Status= This project is currently just a proposal. It is planned to be implemented in three stages 1. Setting up environment and documenting the capabilities of the GPU benchmarking and finding supported extensions and Imagination technology's proprietary API such as the BufferClass API and PVR2D API 2. Building a library providing a skeleton library supporting very basic constructs (Ones mentioned in tutorials: http://gpgpu.org/developer/legacy-gpgpu-graphics-apis) and an implementation of Linear Algebra utilities on GPU (BLAS and LAPACK) which shall be useful for the next week 3.Use the above mentioned libraries to stream videos as texture to GPU and reconstructing a depth map from them. This is useful since the video streams can then be replaced by camera feed, which will aid in near real time object detection (since the GPU has the power similar to that of 2005 level PC grade GPUs this is possible as done in research papers around that time)

About you
IRC: Sidharth

School: Indian Institute of Technology Hyderabad Country: India Primary language : English Typical work hours : 08:00 to 21:00 IST Previous GSoC participation: No previous experience. I primarily want to take part in GSoC because I am very excited to work in collaboration with an open source community and produce stuff which is useful to as many people as possible. I am also enthusiastic about working with GPUs and the opportunity has presented itself infront of me to work on the same!

About your project
Project name: Beaglebone GPU offload

Description
In all beaglebone devices, in the TI's ARM processor we have a GPU which is made by Imagination Technologies, which on AM335x is PowerVR SGX 530 having a processing power of 2005ish graphics cards. This can be particularly useful in situations where code has components which can be executed in parallel. Using GPU for such computations can be efficient as well as can keep the CPU free to perform other tasks. Ofcourse, the GPU is too weak to be performing ML or intense Computer vision algorithms but many of old Computer vision algorithms and basic linear algebra calculations can be sped up. so people who have wish to use Beaglebone for robotics/automation can use the built libraries. Also, since openGL ES has already been standardized by Khronos group, users along with the generated benchamrks and documentation results can implement portable GPU code (if worth it; benchmark allows you to judge that) which will work similarly whether be mobile, Raspberry pi or Beagleboard, thus allowing merging of code developed for raspberry pi GPU albeit at much lower performance. Finally, this can be useful for those users who wish to drive small LCD displays since this will be more efficient. However this hasn't already been done due to the fact that libEGL is provided as a binary blob as well as the fact that PVR2d acceleration does not support X11, thus requiring use of Wayland. The library will provide the mentioned functionality as C++ classes. It should be noted that the main aim of this project is to document and demo the usage of GPU for computations so the library itself would be very basic as the tools made for the demo. This can, in theory atleast be mitigated by using the vast availability of OpenGL ES code online, one of such being an openCL library for raspberry pi.

Timeline
Before Week 1: Collect and go through source codes for the planned features. Get the board to work with Weston display compositor. Week 1: Get the GPU to work headlessly (most probably by offscreen rendering using weston). Possible problems: Getting weston to work and then seeing how can it being initialized without a display. Week 2: Run the benchmark on beaglebone to find which extensions are supported. (glmark2, deqp ,pitger). Document the proprietary IMG extensions and API such as Bufferclass API and PVR2D, learning how much CPU is used as well. (https://archive.fosdem.org/2015/schedule/event/gl_testing/attachments/slides/670/export/events/attachments/gl_testing/slides/670/slides.pdf) Possible problems: Getting the drivers to work with the benchmarks can be a major issue here, but stable drivers should avert that problem. Also trying more than one utility is key here. Week 3: Get a "Hello world" application of sorts running (|Hello GPGPU). Starting work for a framebuffer encapsulation. The encapsulation will be based on somewhat |FBOclass object but it wil also depend on the extensions supoorted Possible problems: Hello GPGPU should not be a problem at all, especially after benchmark runs. This week is not that troublesome at all, so it can be used to finish the standing the problems if any. Week 4: Test the class abstraction and benchmark performance making two similar programs operating on the same, and document and clean up. Possible problems: None out of the blue.

Deliverables for this period : Benchmark and FBOClass

Week 5: Start working on the skeleton library, taking the turotial 1 on | Legacy GPGPU as reference. Possible problems: This won't be a full fledged library, more like a simple parser outputting shader code assigned to specific functions with different parameters. Even still, care will have to be taken to output errors while compiling shaders. May need to borrow time from previous weeks to complete the testing Week 6: Use tools built above for implementing a simple BLAS library, and comparing results with the similar code running over CPU Possible problems: Should be easy enough. Testing will take time but not much. Would be better if this and next one is finished as fast as possible to leave time for stereo correspondence since no direct GPU implementation in openGL is there. Week 7: Do the same for LAPACK library as well Week 8: Document the given libraries and finish

Deliverables for this period : Basic library, BLAS and LAPACK tools

Week 9: Do the streaming video thing as done |here. It will serve as prep for Stereo vision. Possible problems : None, and should not overflow than the allocated time. Week 10: Real time stereo correspondence algorithm implementation in GPU. OpenMP and OpenCV implemetation can be taken as a reference, along with research papers published. Week 11: Finish the program Possible problems: Should have enough time to implement the algoritms in OpenGL and test it. Also, for faster testing, this can be tested on different device (such as android) first and then on beaglebone. Week 12: Documentation of the stereo vision application and convert it into a function.

Deliverables for this period : Stereo vision application which can find depth map for video from two fixed cameras

Experience and approach
Experience I started with embedded systems from our IoT course, where I made smart room sensor as a project. Since then I have come a long way and have used Arduino Uno and Nano, Raspberry pi and mbed LPC1768, accessing low power modes on the LPC board through manipulating registers as per given in its manual. I also know VHDL, where I implemented a simple processor which could execute Brainf**k natively (no pipeline, debug or cache though). On the software side I have experience with OpenGL and Unity, C/C++, python, MATLAB. I am interested in ML and CV and so I am currently doing courses for the same. I thus know I can do this project as I have the knowledge as well as no other commitments for summer holidays.

Approach Strictly as per timeline. Since I have highlighted what problems as well as what sources I will use, I should be able to avoid most of them during working on the project. Also, I will try to have stable drivers for the platform by working on resolving it now only, and will go through the mentioned sources throughly beforehand, which should be able to pinpoint specific issues before getting ambushed by them, which will result in success.

Contingency
What will you do if you get stuck on your project and your mentor isn’t around? If my mentor is not around I will first use google (I can handle many tabs at once!) as well as using stackoverflow and reddit. I will also ask my brother/my profs/his profs if I can't still get it working.

Benefit
There will be enough information for users to establish whether or not to use the GPU aas well as how to use it. Mar 11 07:18:26 a basic demo is a bunch of GL calls to set things up and create a shader follow by a send texture, render, read texture loop...that part isn't that complex Mar 11 07:18:41 but putting it together to show it being useful has value Mar 11 07:22:09 the GLES stuff is a bunch of binary blobs... on paper, it should be possible to use GLES with framebuffers, wayland, X, Android, etc Mar 11 07:22:26 in reality, only a few of those work (due to the binary libEGL) Mar 11 07:22:45 so it would be wise to plan on time to figure out which one of those work well enough for this Mar 11 07:26:01 if anything - getting it work w/framebuffer and X and other EGL flavors would be potentially useful Mar 11 07:26:19 I am personally interestedi n framebuffer as I suspect it is the lowest overhead

Suggestions
Is there anything else we should have asked you?