Frank's Evolving Overlay Thoughts

Top Device Tree page

Overview
This wiki page is a work in progress, reflecting some of the current thinking of me (Frank Rowand, frowand (dot) list (at) gmail (dot) com). It is not a complete compendium of the state and direction of devicetree overlays. I will update this page as my opinions change and as discussion occurs. Other people may have different opinions.

Please do not edit this page to reflect your opinions. If you think that there is a factual error on this page, please email me asking that I correct it.

If you have a differing opinion, you can either
 * email it to me and I will add it to this page, noting that the item is your opinion by attaching your initials to it, or
 * use the Discussion tab to start a conversation (to be honest, I have never used the Discussion feature in this wiki, so I do not know how well that will work)
 * I will attempt to incorporate information from the Discussion tab into this page, again with initials attached to indicate the source of differing opinions

I am likely to disagree with myself. I don't know how I will notate that when I edit this page to reflect when I change my opinion. Maybe strikeouts, maybe place dates on various conflicting opinions -- I'll figure that out as needed.

Current Status
In my opinion, the devicetree overlay implementation in the Linux kernel is a proof of concept. I do not consider it to be even alpha complete.

issues and what needs to be completed -- Not an exhaustive list

 * kernel support of property and/or node changes at run-time
 * The current process is to be very restrictive about what overlays can touch, and carefully become less restrictive. We can't have it be wide open and then lock things down later and break users.


 * overlay manager
 * do not create this function while underlying problems still exist
 * might to able to phase in pieces of an overlay manager as portions of the underlying issues are resolved


 * refcounting issues
 * consider whether refcounting can be moved to the changeset or overlay level


 * memory leaks
 * node reference count bugs
 * adding or modifying properties to a node existing in the base devicetree
 * some possible corner case bugs
 * may lead to devicetree access API that does not expose any devicetree internal pointers


 * use after free bugs
 * node reference count bugs
 * may lead to devicetree access API that does not expose any devicetree internal pointers


 * locking is broken for the use case of overlays


 * how to detect whether an overlay is compatible with the base devicetree
 * linux kernel tree build of fdtoverlay (5.12-rc1)
 * linux kernel build rules to build overlay FDT .dtbo from overlay source .dtso .dts (5.12-rc1)
 * bike shedding continues on whether overlay source should be .dts or .dtso
 * build rules to use fdtoverlay to apply overlays to base under review Feb 2021
 * convention to use includes to build overlays with base targetted Feb 2021
 * this uses the dtc compiler to check syntax and report errors clearly
 * discussion of using dtc to apply overlays to base tree ongoing Feb 2021
 * NOTE: Android project has yet another program to apply overlays to base tree
 * Instead of acting on FDT (as fdtoverlay does), base tree is unflattened before applying overlays


 * extend the devicetree validation project to handle overlay specific issues
 * convention to use includes to build overlays with base targetted Feb 2021
 * This uses the validation tools to detect semantic errors


 * reduce overhead in base FDT needed to support overlays


 * possible interactions with drivers / device binding mid-overlay application
 * driver was deferred, overlay provides resource driver was blocked on
 * driver could be loaded as a module at any point in time


 * ensuring that subsystem(s) and driver(s) affected by an overlay are overlay aware
 * this need might be minimized if subsystem(s) and driver(s) affected by an overlay are unbound before applying or removing the overlay
 * the current un-documented rule is that related drivers must be unbound before an overlay is applied or removed
 * could a driver support re-probe of a modified node instead of unbind/bind?
 * how to determine what software is impacted by an overlay apply or remove
 * software for devices outside the overlay may reference the overlay via a phandle
 * software for devices outside may have walked the tree and found objects in the overlay
 * etc
 * subsystems
 * (Related: of_reconfig_notifier_register/of_reconfig_notifier_unregister, of_overlay_notifier_register/of_overlay_notifier_unregister) Clue: search for for_each_available_child_of_node and similar functions
 * gpio
 * of_gpio_notify
 * i2c
 * of_i2c_notify
 * IRQs
 * mdio PHY
 * of_mdiobus_register, as of v5.7.8 https://elixir.bootlin.com/linux/v5.7.8/source/drivers/of/of_mdio.c#L290
 * /* auto scan for PHYs with empty reg property */ for_each_available_child_of_node(np, child) {
 * This may change, see thread at: https://lore.kernel.org/r/20200706181331.x2tn5cl5jn5kqmhx@gilmour.lan
 * memory
 * mfd
 * Adding a child node to an mfd node [see mfd_add_devices]
 * pinctrl
 * platform drivers
 * of_platform_notify
 * spi
 * of_spi_notify
 * thermal zones -- not overlay aware October 10, 2017
 * etc


 * pre-removal check needs to ensure relevent driver(s) unbind
 * (related: of_reconfig_notifier_register/of_reconfig_notifier_unregister)


 * overlay removal rule of last in, first out be relaxed?
 * need to avoid leaking phandle range on removal


 * decide what the overlay semantics are and document them
 * should pre-boot semantics be the same as when overlay is applied after boot?
 * should bootloader semantics match kernel semantics?
 * should an overlay be allowed to modify existing properties
 * are there special properties that can not be modified?
 * phandle (this already is not allowed)
 * #address-cells
 * #size-cells
 * others?
 * can an overlay add new properties to an existing node?
 * can an overlay delete existing nodes/properties?
 * an overlay with a NOPed property is not a valid overlay if a local fixup exists for the property
 * should this be allowed?
 * should an overlay be allowed to modify a property that was modified by a previous overlay?
 * etc


 * dtc features
 * deprecate hand-coded overlay internal data and prohibit in Linux tree
 * change format of overlay internal metadata in FDT
 * Frank to document previous discussions on updating FDT format late Feb 2021
 * Frank to start discussion on updating FDT format early Mar 2021
 * add property delete encoding in FDT
 * create a mechanism for dtc checks for overlays to be aware of the base devicetree (or a schema) that the overlay will apply to instead of disabling problematic checks
 * convention to use includes to build overlays with base targetted Feb 2021
 * discussion of using dtc to apply overlays to base tree ongoing Feb 2021


 * should the Linux kernel devicetree overlay code be converted to a new FDT format before any in-tree overlays are accepted?


 * some, but not all, overlay metadata is exposed in /proc/device-tree as if it is actual devicetree data
 * need to remove this before it becomes userspace API


 * Create connector specification
 * lot of issues hidden in here...
 * proof of concept implementation started summer 2020, then stalled
 * Frank plans to resume proof of concept late Mar 2021


 * Create Overlay Specification information in Devicetree Specification document


 * Consider overlay(s) application in early boot, probably as part of base FDT unflattening or immediately following base FDT unflattening (examples: memreserve, console)
 * Issues:
 * Some code accesses the base FDT before unflattening occurs.
 * Should objects accessed by this code be prohibited from change by overlay?
 * Should there be a process to decide whether new code should be allowed to access the base FDT before unflattening?
 * What if new code needs to access the base FDT before unflattening but there are already overlays containing that data?
 * etc...
 * Need mechanism to provide overlay(s) to kernel in early boot.
 * In a file system?
 * Built into the kernel image?
 * Appended to the kernel image?
 * Appended to the base FDT?
 * kexec reads /sys/firmware/fdt to create an FDT to be passed to the kernel that it will boot
 * kexec will also need the overlay information that was used to the existing kernel
 * Advantages
 * Drivers and subsystems do not need to be overlay aware (other than code that accesses the base FDT before unflattening)
 * Alternatives to achieve same result
 * Boot loader applies overlay into base FDT before passing the base FDT to the kernel
 * U-Boot has added this feature.


 * As an overlay is being applied or removed, /proc/device-tree/ will reflect the changes. How can user space avoid inconsistent devicetrees during overlay application and removal?
 * Should this be a documented feature instead of creating a synchronization method?


 * Enumerate the possible use cases of overlays (being on this list does not imply that it is not currently implemented)
 * A use case being listed does not mean that it will be supported
 * Partial list
 * conceptual models:
 * hot-swap vs cold swap
 * single purpose connector (eg contains only a SPI bus) vs exposes a significant fraction of the pins of an SOC with pinmux involved (eg beaglebone)
 * Hot swappable hardware
 * undiscoverable hardware behind a USB plugged bridge
 * hot swappable memory
 * seedstudios Grove 4 pin connectors ?
 * expansion slots and external connectors (not hot-swappable, may be swapped when the board is powered off)
 * Arduino
 * beaglebone
 * C.H.I.P.
 * minnowboard
 * raspberry pi
 * seedstudios Grove 4 pin connectors
 * others...
 * FPGAs programmed after kernel begins execution
 * Combinatorial explosion of .dts / .dtb files
 * physical form factor compatible parts substitution but not software compatible
 * New Android dts file partitioning scheme
 * cpu module is the pluggable card into a slot on the baseboard


 * full review of devicetree core overlay related code


 * list of aliases is not updated when a DT overlay that adds an alias is loaded or unloaded (June 30, 2015 -- need to verify if still true)


 * "kill switch" - a mechanism to disallow any overlay apply or release until after the next boot. This is to address a concern of the security community


 * devlink
 * devlink has become quite extensive, but has no knowledge of overlays. What is required for devlink and overlay to co-exist?

what has been completed

 * initial overlay apply API and implementation


 * overlay apply API converted to use an FDT as input
 * lifetime of copy of the overlay FDT and expanded overlay devicetree created from the overlay FDT managed by the devicetree core
 * lifetime of changeset describing overlay managed by the devicetree core


 * initial overlay removal API and implementation


 * overlay.c refactored


 * resolver.c refactored


 * allow stacked overlay phandle references


 * FPGAs programmed after kernel begins execution
 * implemented in drivers/fpga/of-fpga-region.c
 * outstanding issues that are generic may still also impact the FPGA use case


 * many overlay tests in drivers/of/unittest.c
 * commit in robh dt/next branch, not yet pulled by Linus (5.19-rc1?):
 * cca549335f5e of: unittest: re-implement overlay tracking
 * overlay notifier tests
 * commit in robh dt/next branch, not yet pulled by Linus:
 * 992b0dc5c38a of: overlay: unittest: add tests for overlay notifiers


 * drivers/of/overlay.c
 * 48d499bd8919 of: unittest: overlay: ensure proper alignment of copied FDT
 * overlay notifier issues
 * commit in robh dt/next branch, not yet pulled by Linus (5.19-rc1?):
 * 5f756a2eaa44 of: overlay: do not break notify on NOTIFY_{OK|STOP}


 * memory leaks
 * check for some possible memory leaks added in Linux v5.0:
 * commit 144552c78692 ("of: overlay: add tests to validate kfrees from overlay removal")
 * memory leaks in unittest and in the overlay code have been detected by kmemleak:
 * https://bugzilla.kernel.org/show_bug.cgi?id=206203
 * https://lore.kernel.org/r/877dyqlles.fsf@mpe.ellerman.id.au
 * Leaks fixed in Linux v5.5-rc1:
 * 637392a8506a of: overlay: add_changeset_property memory leak
 * There appear to be additional leaks, but many have been fixed in Linux v5.7-rc2:
 * 29acfb65598f of: unittest: kmemleak in duplicate property update
 * 478ff649b1c8 of: overlay: kmemleak in dup_and_fixup_symbol_prop
 * 145fc138f9aa of: unittest: kmemleak in of_unittest_overlay_high_level
 * 216830d2413c of: unittest: kmemleak in of_unittest_platform_populate
 * b3fb36ed694b of: unittest: kmemleak on changeset destroy
 * additional memory leak detected by added overlay notify unittests, fixed:
 * commit in robh dt/next branch, not yet pulled by Linus (5.19-rc1?):
 * 421f4d14bc03 of: overlay: do not free changeset when of_overlay_apply returns error
 * kfree issues in overlay.c
 * commit in robh dt/next branch, not yet pulled by Linus (5.19-rc1?):
 * 067c098766c6 of: overlay: rework overlay apply and remove kfrees


 * ensuring that subsystem(s) and driver(s) affected by an overlay are overlay aware
 * subsystems
 * GPIO
 * gpio hogs not processed
 * RFC fix 30 Dec 2019:
 * https://lore.kernel.org/linux-devicetree/20191230133852.5890-1-geert+renesas@glider.be/
 * v1 fix 20 Feb 2020:
 * https://lore.kernel.org/linux-devicetree/20200220130149.26283-1-geert+renesas@glider.be
 * v2 patch of unittests to detect problem and verify fix 20 Feb 2020:
 * https://lore.kernel.org/linux-devicetree/1582224021-12827-1-git-send-email-frowand.list@gmail.com/
 * dtc features
 * dtc support of overlay fragments
 * dtc support of sugar syntax if base tree has the required labels
 * auto-generate overlay internal metadata
 * overlay target type of label
 * overlay target type of path
 * how to detect whether an overlay is compatible with the base devicetree
 * linux kernel tree build of fdtoverlay (5.12-rc1)
 * fdtoverlay can be used to test static overlay apply
 * linux kernel build rules to build overlay FDT .dtbo from overlay source .dtso .dts (5.12-rc1)
 * bike shedding continues on whether overlay source should be .dts or .dtso