Tims Notes on ARM memory allocation
This page has a collection of notes that I made while working on the stack limit patches for Sony. This was a set of patches that tried to map kernel memory areas as 4K pages, so that an individual page of the stack could be unmapped, and a page fault generated, when stack space was running low. (This was related to testing an implementation of 4K stacks on ARM).
These notes are placed here in the hopes that they will be useful for someone working on ARM memory management.
At start up the kernel (usually) automatically determines the physical memory areas. However, the user can manually specify one or more physical memory areas using "mem=..." kernel command line options. When used, these override any automatically determined values. These are parsed by early_mem().
The initial description of physical memory regions are stored in the global 'meminfo' structure, and each region is described as a 'bank'. Some initial physical memory is utilized by the bootmem allocator. It is from this pool of physical memory, that the page tables are built, which allows the MMU to be turned on, and the kernel switched over to virtual memory. Once this process is done, the bootmem pool is freed and all system pages are turned over to the various page and slab allocators of the system. A good reference for this is Mel Gorman's excellent book on the topic: Understanding the Linux Virtual Memory Manager The prior link is to the PDF, here's a link to html: http://www.kernel.org/doc/gorman/html/understand/ Chapter 5 talks about the bootmem allocator.
The page tables are, unsurprisingly, initialized by paging_init(). ARM uses a somewhat weird way of mapping the Linux page tables onto the ARM hardware tables. This method is described in comments in the file arch/arm/include/asm/pgtable.h, with additional macros defined in pgtable-hwdef.h and page.h. Basically, Linux supports 4 levels (pgd, pud, pmd, and pte), and ARM maps this onto 2 levels (pgd/pmd and pte). The nomenclature in the code is hard to follow, because Linux generic code thinks that pgd is the top level of page tables, but internally the ARM code uses pmd macros to refer to the top hardware page table.
Originally, Linux used 4KB mappings for ARM, but they have converted over to mostly 1MB mappings (at least for the Linux kernel). According to my colleague, Frank Rowand, bad things happen if a physical page is represented in the page table by more than one entry (for example, if a physical page has both an entry as a small page in a second-level page table, and is inside a region covered by a large "section" page entry in a first-level page table.
At the hardware level, ARM supports two page table trees simultaneously, using the hardware registers TTBR0 and TTBR1. A virtual address is mapped to a physical address by the CPU depending on settings in TTBRC. This control register has a field which sets a split point in the address space. Addresses below cutoff value are mapped through the page tables pointed to by TTBR0, and addresses above the cutoff value are mapped through TTBR1. TTBR0 is unique per-process, and is in current->mm.pgd (That is, current->mm.pgd == TTBR0 for that process). TTBR1 is global for the whole system, and represents the page tables for the kernel. It is referenced in the global kernel variable swapper_pg_dir. Note that both of these addresses are virtual addresses. You can find the physical address of the first-level page table by using virt_to_phys() functions on these addresses.
I found the following ARM reference material helpful in trying to understand the page table layout: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0333h/I1029222.html - this is the chapter on the MMU in the ARM1176JZ-S Technical Reference Manual.
Note that in particular, the second-level page table (PTE) layout is weird. The ARM hardware supports 1K tables at this level (256 entries of 4bytes each). However, the Linux kernel needs some bits that are not provided on some hardware (like DIRTY and YOUNG). These are synthesized by the ARM MMU code via permissions and faulting, effectively making them software-managed. They have to be stored outside the hardware tables, but still in a place where the kernel can get to them easily. In order to keep things aligned into units of 4K, the kernel therefore keeps 2 hardware second-level page tables and 2 parallel arrays (totalling 512 entries) on a single page. When a new second-level page table is created, it is created 512 entries at a time, with the hardware entries at the top of the table, and Linux software entries (synthesized flags and values) at the bottom.
The kernel has a mixture of 1MB mappings and 4KB mappings. This relieves some pressure on the TLB, which is used by the CPU to cache virtual-to-physical address translations.