Frank's Evolving Overlay Thoughts

Top Device Tree page

Overview
This wiki page is a work in progress, reflecting some of the current thinking of me (Frank Rowand, frowand (dot) list (at) gmail (dot) com). It is not a complete compendium of the state and direction of devicetree overlays. I will update this page as my opinions change and as discussion occurs. Other people may have different opinions.

Please do not edit this page to reflect your opinions. If you think that there is a factual error on this page, please email me asking that I correct it.

If you have a differing opinion, you can either
 * email it to me and I will add it to this page, noting that the item is your opinion by attaching your initials to it, or
 * use the Discussion tab to start a conversation (to be honest, I have never used the Discussion feature in this wiki, so I do not know how well that will work)
 * I will attempt to incorporate information from the Discussion tab into this page, again with initials attached to indicate the source of differing opinions

I am likely to disagree with myself. I don't know how I will notate that when I edit this page to reflect when I change my opinion. Maybe strikeouts, maybe place dates on various conflicting opinions -- I'll figure that out as needed.

Current Status
In my opinion, the devicetree overlay implementation in the Linux kernel is a proof of concept. I do not consider it to be even alpha complete.

issues and what needs to be completed -- Not an exhaustive list

 * kernel support of property and/or node changes at run-time
 * The current process is to be very restrictive about what overlays can touch, and carefully become less restrictive. We can't have it be wide open and then lock things down later and break users.


 * overlay manager
 * do not create this function while underlying problems still exist
 * might to able to phase in pieces of an overlay manager as portions of the underlying issues are resolved


 * refcounting issues
 * consider whether refcounting can be moved to the changeset or overlay level


 * memory leaks
 * node reference count bugs
 * adding or modifying properties to a node existing in the base devicetree
 * some possible corner case bugs
 * may lead to devicetree access API that does not expose any devicetree internal pointers


 * use after free bugs
 * node reference count bugs
 * may lead to devicetree access API that does not expose any devicetree internal pointers


 * locking is broken for the use case of overlays


 * how to detect whether an overlay is compatible with the base devicetree


 * reduce overhead in base FDT needed to support overlays


 * possible interactions with drivers / device binding mid-overlay application
 * driver was deferred, overlay provides resource driver was blocked on
 * driver could be loaded as a module at any point in time


 * ensuring that subsystem(s) and driver(s) affected by an overlay are overlay aware
 * this need might be minimized if subsystem(s) and driver(s) affected by an overlay are unbound before applying or removing the overlay
 * the current un-documented rule is that related drivers must be unbound before an overlay is applied or removed
 * could a driver support re-probe of a modified node instead of unbind/bind?
 * how to determine what software is impacted by an overlay apply or remove
 * software for devices outside the overlay may reference the overlay via a phandle
 * software for devices outside may have walked the tree and found objects in the overlay
 * etc
 * subsystems
 * (Related: of_reconfig_notifier_register/of_reconfig_notifier_unregister, of_overlay_notifier_register/of_overlay_notifier_unregister) Clue: search for for_each_available_child_of_node and similar functions
 * gpio
 * of_gpio_notify
 * i2c
 * of_i2c_notify
 * IRQs
 * mdio PHY
 * of_mdiobus_register, as of v5.7.8 https://elixir.bootlin.com/linux/v5.7.8/source/drivers/of/of_mdio.c#L290
 * /* auto scan for PHYs with empty reg property */ for_each_available_child_of_node(np, child) {
 * This may change, see thread at: https://lore.kernel.org/r/20200706181331.x2tn5cl5jn5kqmhx@gilmour.lan
 * memory
 * mfd
 * Adding a child node to an mfd node [see mfd_add_devices]
 * pinctrl
 * platform drivers
 * of_platform_notify
 * spi
 * of_spi_notify
 * thermal zones -- not overlay aware October 10, 2017
 * etc


 * pre-removal check needs to ensure relevent driver(s) unbind
 * (related: of_reconfig_notifier_register/of_reconfig_notifier_unregister)


 * overlay removal rule of last in, first out be relaxed?
 * need to avoid leaking phandle range on removal


 * decide what the overlay semantics are and document them
 * should pre-boot semantics be the same as when overlay is applied after boot?
 * should bootloader semantics match kernel semantics?
 * should an overlay be allowed to modify existing properties
 * are there special properties that can not be modified?
 * phandle (this already is not allowed)
 * #address-cells
 * #size-cells
 * others?
 * can an overlay add new properties to an existing node?
 * can an overlay delete existing nodes/properties?
 * an overlay with a NOPed property is not a valid overlay if a local fixup exists for the property
 * should this be allowed?
 * should an overlay be allowed to modify a property that was modified by a previous overlay?
 * etc


 * dtc features
 * deprecate hand-coded overlay internal data and prohibit in Linux tree
 * change format of overlay internal metadata in FDT
 * add property delete encoding in FDT
 * create a mechanism for dtc checks for overlays to be aware of the base devicetree (or a schema) that the overlay will apply to instead of disabling problematic checks


 * should the Linux kernel devicetree overlay code be converted to a new FDT format before any in-tree overlays are accepted?


 * some, but not all, overlay metadata is exposed in /proc/device-tree as if it is actual devicetree data
 * need to remove this before it becomes userspace API


 * Create connector specification
 * lot of issues hidden in here...


 * Create Overlay Specification information in Devicetree Specification document


 * Consider overlay(s) application in early boot, probably as part of base FDT unflattening or immediately following base FDT unflattening (examples: memreserve, console)
 * Issues:
 * Some code accesses the base FDT before unflattening occurs.
 * Should objects accessed by this code be prohibited from change by overlay?
 * Should there be a process to decide whether new code should be allowed to access the base FDT before unflattening?
 * What if new code needs to access the base FDT before unflattening but there are already overlays containing that data?
 * etc...
 * Need mechanism to provide overlay(s) to kernel in early boot.
 * In a file system?
 * Built into the kernel image?
 * Appended to the kernel image?
 * Appended to the base FDT?
 * kexec reads /sys/firmware/fdt to create an FDT to be passed to the kernel that it will boot
 * kexec will also need the overlay information that was used to the existing kernel
 * Advantages
 * Drivers and subsystems do not need to be overlay aware (other than code that accesses the base FDT before unflattening)
 * Alternatives to achieve same result
 * Boot loader applies overlay into base FDT before passing the base FDT to the kernel
 * U-Boot has added this feature.


 * As an overlay is being applied or removed, /proc/device-tree/ will reflect the changes. How can user space avoid inconsistent devicetrees during overlay application and removal?
 * Should this be a documented feature instead of creating a synchronization method?


 * Enumerate the possible use cases of overlays (being on this list does not imply that it is not currently implemented)
 * A use case being listed does not mean that it will be supported
 * Partial list
 * conceptual models:
 * hot-swap vs cold swap
 * single purpose connector (eg contains only a SPI bus) vs exposes a significant fraction of the pins of an SOC with pinmux involved (eg beaglebone)
 * Hot swappable hardware
 * undiscoverable hardware behind a USB plugged bridge
 * hot swappable memory
 * seedstudios Grove 4 pin connectors ?
 * expansion slots and external connectors (not hot-swappable, may be swapped when the board is powered off)
 * Arduino
 * beaglebone
 * C.H.I.P.
 * minnowboard
 * raspberry pi
 * seedstudios Grove 4 pin connectors
 * others...
 * FPGAs programmed after kernel begins execution
 * Combinatorial explosion of .dts / .dtb files
 * physical form factor compatible parts substitution but not software compatible
 * New Android dts file partitioning scheme
 * cpu module is the pluggable card into a slot on the baseboard


 * extend the devicetree validation project to handle overlay specific issues


 * full review of devicetree core overlay related code


 * list of aliases is not updated when a DT overlay that adds an alias is loaded or unloaded (June 30, 2015 -- need to verify if still true)


 * "kill switch" - a mechanism to disallow any overlay apply or release until after the next boot. This is to address a concern of the security community

what has been completed

 * initial overlay apply API and implementation


 * overlay apply API converted to use an FDT as input
 * lifetime of copy of the overlay FDT and expanded overlay devicetree created from the overlay FDT managed by the devicetree core
 * lifetime of changeset describing overlay managed by the devicetree core


 * initial overlay removal API and implementation


 * overlay.c refactored


 * resolver.c refactored


 * allow stacked overlay phandle references


 * FPGAs programmed after kernel begins execution
 * implemented in drivers/fpga/of-fpga-region.c
 * outstanding issues that are generic may still also impact the FPGA use case


 * many overlay tests in drivers/of/unittest.c


 * memory leaks
 * check for some possible memory leaks added in Linux v5.0:
 * commit 144552c78692 ("of: overlay: add tests to validate kfrees from overlay removal")
 * memory leaks in unittest and in the overlay code have been detected by kmemleak:
 * https://bugzilla.kernel.org/show_bug.cgi?id=206203
 * https://lore.kernel.org/r/877dyqlles.fsf@mpe.ellerman.id.au
 * There appear to be additional leaks, but many have been fixed in Linux v5.7-rc2:
 * 29acfb65598f of: unittest: kmemleak in duplicate property update
 * 478ff649b1c8 of: overlay: kmemleak in dup_and_fixup_symbol_prop
 * 145fc138f9aa of: unittest: kmemleak in of_unittest_overlay_high_level
 * 216830d2413c of: unittest: kmemleak in of_unittest_platform_populate
 * b3fb36ed694b of: unittest: kmemleak on changeset destroy
 * ensuring that subsystem(s) and driver(s) affected by an overlay are overlay aware
 * subsystems
 * GPIO
 * gpio hogs not processed
 * RFC fix 30 Dec 2019:
 * https://lore.kernel.org/linux-devicetree/20191230133852.5890-1-geert+renesas@glider.be/
 * v1 fix 20 Feb 2020:
 * https://lore.kernel.org/linux-devicetree/20200220130149.26283-1-geert+renesas@glider.be
 * v2 patch of unittests to detect problem and verify fix 20 Feb 2020:
 * https://lore.kernel.org/linux-devicetree/1582224021-12827-1-git-send-email-frowand.list@gmail.com/
 * dtc features
 * dtc support of overlay fragments
 * dtc support of sugar syntax if base tree has the required labels
 * auto-generate overlay internal metadata
 * overlay target type of label
 * overlay target type of path