BeagleBoard/GSoC/2023 Proposal/Khushi-Balia

Proposal for Building an LLVM Backend for PRU

 * Student: Khushi Balia
 * Mentors: Vedant Paranjape, Shreyas Atre
 * Proposal : https://elinux.org/BeagleBoard/GSoC/2023_Proposal/Khushi-Balia
 * GSoC: Proposal Request

= Status =
 * This project is currently just a proposal.

= Proposal =
 * Completed the prerequisites
 * Created a PR for the task https://github.com/jadonk/gsoc-application/pull/175

About You

 * IRC Name: Khushi Balia
 * Github:https://github.com/Khushi-Balia
 * College: Veermata Jijabai Technological Institute (VJTI)
 * Country: India
 * Primary language: English, Hindi, Gujarati
 * Typical work hours: 9 AM - 7 PM Indian Standard Time
 * Experience :
 * https://github.com/Khushi-Balia/le-transpiler is the project in which I built a transpiler that converts a code in a custom language PYLOX, to an equivalent code in C.
 * My areas of interest are Compiler development, Embedded Systems.
 * I am familiar with git and operating in a Linux environment.
 * I am actively involved in the robotics club of my institute Society of Robotics and Automation SRA as a core member.
 * I am participating in GSoC for the first time.

= About Your Project =


 * Project name: Building an LLVM Backend for PRU

Why LLVM?

 * There is an extreme benefit of having a compiler architected using the LLVM model; because of the modularity and the defined boundaries of each stage, new source languages, target architectures, and optimization passes can be added or modified mostly independent of each other.
 * LLVM is different from most traditional compiler projects because it is not just a collection of individual programs, but rather a collection of libraries. These libraries are all designed using object-oriented programming and are extendable and modular. This along with its three-phase approach and its modern code design makes it a very appealing compiler infrastructure to work with.



THE LLVM BACKEND (Code Generator Design)

 * The code generator framework provides many classes, methods, and tools to help translate the LLVM IR code into target-specific assembly or machine code. The two main target-specific components that comprise a custom backend are the abstract target description, and the abstract target description implementation.


 * TableGen: necessary for writing the abstract target description. This tool translates a target description file (.td) into C++ code that is used in code generation. Its main goal is to reduce large, tedious descriptions into smaller and flexible definitions that are easier to manage and structure.
 * We’ll be using the TableGen code to define each of the registers that are in the PRU architecture. The AsmWriter TableGen backend, which is responsible for creating code to help with printing the target-specific assembly code, generates the C++ code.




 * Clang and llc: Clang is the front end for LLVM which supports C, C++, and Objective C/C++ . The llc tool is the LLVM static compiler. The custom backends written for LLVM are each linked into llc which then compiles LLVM IR code into the target-specific assembly or machine code.

Custom Target Implementation: The custom LLVM backend inherits from and extends many of the LLVM classes.To implement an LLVM backend, most of the files will be placed in LLVM’s lib/Target/PRU/ directory, that we’ll make. The “entry point” for PRU LLVM backend will be within the PRUMCTargetDescription. This is where the backend is registered with the LLVM TargetRegistry so that LLVM can find and use the backend.

Abstract Target Description: The majority of the abstract target description is written in TableGen format. The major components of the PRU backend that will be written in TableGen form are the register information, calling convention, special operands, instruction formats, and the complete instruction definitions.


 * Register Information: The register information will be defined in PRURegisterInfo.td. This file will define the register set of the PRU as well as different register classes.
 * Calling Conventions: The calling convention definitions describe the part which controls how data moves between function calls.They’ll be defined in the in PRUCallingConv.td.
 * Instruction Formats: The instruction formats will describe the instruction word formats as per the formats of PRU instructions.These formats will be defined in PRUInstrFormats.td.
 * Complete Instruction Definitions: The complete instruction definitions inherit from the instruction format classes to complete the TableGen Instruction base class. These complete instructions will be defined in PRUInstrInfo.td.

Instruction Selection: The instruction selection stage of the backend is responsible for translating the LLVM IR code into target-specific machine instructions. Phases of the of the instruction selector: SelectionDAG Construction, Legalization, Selection, Scheduling are performed by the compiler.

Register Allocation: This phase of the backend is responsible for eliminating all of the virtual registers from the list of machine instructions and replacing them with physical registers.

Code Emission: The final phase of the backend is to emit the machine instruction list as either target-specific assembly code (emitted by the assembly printer) or machine code (emitted by the object writer).


 * Assembly Printer and Object Writer: Printing assembly code requires the implementation of several custom classes and the custom machine code is emitted in the form of an object file.



=Implementation Details=


 * I’ll make a new directory for PRU inside the target, lib/Target/PRU/ which will contain the following custom files:
 * 1) PRU.td
 * 2) PRUCallingConv.td
 * 3) PRUInstrFormats.td
 * 4) PRUInstrInfo.td
 * 5) PRURegisterInfo.td
 * 6) PRURegisterInfo.h
 * 7) PRURegisterInfo.cpp
 * 8) PRUInstrInfo.h
 * 9) PRUInstrInfo.cpp
 * 10) PRUFrameLowering.h
 * 11) PRUFrameLowering.cpp
 * 12) PRUISelDAGtoDAG.cpp
 * 13) PRUISelLowering.cpp
 * 14) PRUISelLowering.h
 * 15) PRUMCInstLower.cpp
 * 16) PRUMCInstLower.h
 * 17) PRUMachineFunctionInfo.cpp
 * 18) PRUMachineFunctionInfo.h
 * 19) PRUSubtarget.cpp
 * 20) PRUSubtarget.h
 * 21) PRUTargetMachine.cpp
 * 22) PRUTargetMachine.h
 * 23) PRUAsmPrinter.cpp
 * 24) PRUAsmPrinter.h

I’ll also create the following:
 * lib/Target/PRU/InstPrinter/, which will have
 * 1) PRUInstPrinter.h
 * 2) PRUInstPrinter.cpp


 * lib/Target/PRU/MCTargetDesc/, which will have
 * 1) PRUAsmBackend.cpp
 * 2) PRUELFObjectWriter.cpp
 * 3) PRUFixupKinds.h
 * 4) PRUMCAsmInfo.cpp
 * 5) PRUMCAsmInfo.h
 * 6) PRUMCCodeEmitter.cpp
 * 7) PRUMCTargetDesc.cpp
 * 8) PRUMCTargetDesc.h

I’ll take reference from the existing targets to write these.
 * lib/Target/PRU/TargetInfo/, which will have PRUTargetInfo.cpp



- The datatypes are aligned to an 8-bit boundary. string dataLayout = ""; dataLayout += "e"; // Little-endian dataLayout += "-m:e"; // ELF style name mangling dataLayout += "-p:32:8"; // Set 32-bit pointer size with 8-bit boundary dataLayout += "-i8:8"; dataLayout += "-i16:16:8"; // Align i16 to 8-bit dataLayout += "-i32:32:8"; // Align i32 to 8-bit dataLayout += "-i64:64:8"; //Align i64 to 8-bit dataLayout += "-f32:8"; // Align f32 to 8-bit dataLayout += "-f64:8"; // Align f64 to 8-bit dataLayout += "-n8"; // Set native integer width to 8-bits // "e-m:e-p:32:8-i8:8-i16:16:8-i32:32:8-i64:64:8-f32:8-f64:8-n8"
 * Target Machine: Once I have the LLVM IR, I will move onto describing the characteristics of PRU by creating a subclass of the TargetMachine class and create the PRUTargetMachine.cpp and PRUTargetMachine.h files.


 * Target Registration: I’ll register our target with the TargetRegistry, which is what other LLVM tools use to be able to lookup and use your target at runtime. Declare a global Target object which is used to represent the target(PRU) during registration.


 * Register Set and Register Classes: Describing the register set of thePRU and using TableGen to generate code for register definition, register aliases, and register classes from a target-specific PRURegisterInfo.td input file. I’ll also write additional code for a subclass of the PRURegisterInfo class that’ll represent the class register file data used for register allocation and will also describe the interactions between registers.

class PRUReg Enc, string n,            list altNames = []> : Register { let HWEncoding = Enc; let Namespace = "PRU"; } class PRUCtrlReg Enc, string n> : Register { let HWEncoding = Enc; let Namespace = "PRU"; } let Namespace = "PRU", FallbackRegAltNameIndex = NoRegAltName in { def RegNamesRaw : RegAltNameIndex; } def R0  : PRUReg< 0, "r0">,  DwarfRegNum<[0]>; def R1 : PRUReg< 1, "r1">,  DwarfRegNum<[1]>; let RegAltNameIndices = [RegNamesRaw] in { def SP : PRUReg< 2, "sp", ["r2"]>,  DwarfRegNum<[2]>; def LR : PRUReg< 3, "lr", ["r3"]>,  DwarfRegNum<[3]>; def AP : PRUReg< 4, "ap", ["r4"]>,  DwarfRegNum<[4]>; } def R5 : PRUReg< 5, "r5">,   DwarfRegNum<[5]>; def R6 : PRUReg< 6, "r6">,   DwarfRegNum<[6]>; def R7 : PRUReg< 7, "r7">,   DwarfRegNum<[7]>; def R8 : PRUReg< 8, "r8">,   DwarfRegNum<[8]>; def R9 : PRUReg< 9, "r9">,   DwarfRegNum<[9]>; def R10 : PRUReg<10, "r10">, DwarfRegNum<[10]>; def R11 : PRUReg<11, "r11">, DwarfRegNum<[11]>; def R12 : PRUReg<12, "r12">, DwarfRegNum<[12]>; def R13 : PRUReg<13, "r13">, DwarfRegNum<[13]>; def R14 : PRUReg<14, "r14">, DwarfRegNum<[14]>; def R15 : PRUReg<15, "r15">, DwarfRegNum<[15]>; def R16 : PRUReg<16, "r16">, DwarfRegNum<[16]>; def R17 : PRUReg<17, "r17">, DwarfRegNum<[17]>; def R18 : PRUReg<18, "r18">, DwarfRegNum<[18]>; def R19 : PRUReg<19, "r19">, DwarfRegNum<[19]>; def R20 : PRUReg<20, "r20">, DwarfRegNum<[20]>; def R21 : PRUReg<21, "r21">, DwarfRegNum<[21]>; def R22 : PRUReg<22, "r22">, DwarfRegNum<[22]>; def R23 : PRUReg<23, "r23">, DwarfRegNum<[23]>; def R24 : PRUReg<24, "r24">, DwarfRegNum<[24]>; def R25 : PRUReg<25, "r25">, DwarfRegNum<[25]>; def R26 : PRUReg<26, "r26">, DwarfRegNum<[26]>; def R27 : PRUReg<27, "r27">, DwarfRegNum<[27]>; def R28 : PRUReg<28, "r28">, DwarfRegNum<[28]>; def R29 : PRUReg<29, "r29">, DwarfRegNum<[29]>; def R30 : PRUCtrlReg<30, "r30">, DwarfRegNum<[30]>; def R31 : PRUCtrlReg<31, "r31">, DwarfRegNum<[31]>;


 * Instruction Set: Describing the instruction set of the target. Use TableGen to generate code for target-specific instructions from target-specific versions of PRUInstrFormats.td and PRUInstrInfo.td. I’ll also write additional code for a subclass of the PRUInstrInfo class to represent machine instructions supported by the target machine.

class InstPRU : Instruction { field bits<32> Inst; let Namespace = "PRU"; dag OutOperandList = outs; dag InOperandList = ins; let AsmString  = asmstr; let Pattern = pattern; } // PRU pseudo instructions format class PRUPseudoInst : InstPRU { let isPseudo = 1; let isCodeGenOnly = 1; }


 * Instruction Selector: Describing the selection and conversion of the LLVM IR from a Directed Acyclic Graph (DAG) representation of instructions to native target-specific instructions. Using TableGen to generate code that matches patterns and selects instructions based on additional information in a target-specific version of PRUInstrInfo.td. Writing code for PRUISelDAGToDAG.cpp to perform pattern matching and DAG-to-DAG instruction selection. Also writing code in PRUISelLowering.cpp to replace or remove operations and data types that are not supported natively in a SelectionDAG.


 * Assembly Printer: Writing code for an assembly printer that converts LLVM IR to a GAS format for your target machine. Adding assembly strings to the instructions defined in target-specific version of PRUInstrInfo.td. Also writing code for a subclass of AsmPrinter that performs the LLVM-to-assembly conversion and a trivial subclass of PRUAsmInfo.

namespace { class PRUAsmPrinter : public AsmPrinter { PRUMCInstLower MCInstLowering; public: explicit PRUAsmPrinter(TargetMachine &TM,                          std::unique_ptr Streamer) : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this) {} virtual StringRef getPassName const { return StringRef("PRU Assembly Printer"); }   void EmitFunctionEntryLabel; void EmitInstruction(const MachineInstr *MI); void EmitFunctionBodyStart; }; }


 * Machine Code: Adding JIT support and creating a machine code emitter (subclass of PRUJITInfo). Writing a PRUCodeEmitter.cpp file that will contain a machine function pass that transforms target-machine instructions into relocatable machine code and a PRUJITInfo.cpp file that will implement the JIT interfaces for target-specific code-generation activities, such as emitting machine code and stubs. Modifying PRUTargetMachine so that it provides a TargetJITInfo object through its getJITInfo method.

Experience and approach
This project requires good knowledge and background of compiler development.
 * I have been exploring the compiler developer environment for quite some time,having good knowledge about compilers, and have worked on a transpiler project.
 * I have completed the Kaleidoscope Tutorial of LLVM and read through the LLVM Backend documentations, thus have a good understanding of the same.
 * I have some experience with the esp-32 micro-controller and am into compiler development which is a perfect blend of hardware and software, which is now the requirement of this project.
 * I am an open source enthusiast, passionate about technologies and have always dedicated myself to the work I do with utmost perfection.I have no major commitment other than GSoC during the summer break and would give the best of my potential to complete the project idea in the given time frame.
 * I plan to keep working on this project even after GSoC and also engage with the community often.

Contingency

 * I have prepared a doc of all the links I have referred to during my preparation phase, and if I get stuck anywhere I would be relying on those resources.
 * Moreover the BeagleBoard community is extremely helpful and active in resolving doubts, which makes it a great going for the project resources and clarification.

Benefit

 * The PRU target(am335x) will have an LLVM support, so that we can use clang instead of pru-gcc.
 * Clang is much faster, uses far less memory than GCC, and provide extremely clear and concise diagnostics, thus will be beneficial.
 * The LLVM support will provide better compatibility, optimization and tooling.

Misc
Cross-compilation task,sent a PR to the upstream: