LCNA 2015 Device Mainlining BOF meeting notes

A Birds-of-a-Feather session for Device Mainlining was held at LinuxCon North America on August 19, 2015 in Seattle, Washington.

The leaders of the session were:
 * Tim Bird, LF CE Workgroup
 * Mark Brown, Linaro
 * Kate Stewart, Linux Foundation

There were 21 attendees.

[post-meeting notes by attendees are in brackets, like this]

Presentation
Tim gave overview of problem and current activities of the device mainlining project. The slides are at: [[Media:Device-mainlining-LCNA-BOF-2015-08.pdf]]

Please note that the slides in the presentation after the "DISCUSSION" slide were there for reference material for the discussion, and not explicitly part of the presentation

These BOFS are held because the project is continuing to look for ideas and discussion of different concepts for the project.

Getting source code for source analysis
There are myriad ways of getting source code from product vendors. Everyone has their own system. One elinux page has links to where to get source for different phones. The site also has some diffs posted, so you don’t have to grab the source yourself. See Phones Processors and Download Sites

Anyone can add links, which will help when we want to get stats for next round of phones.

[Tim: Would it be worth providing recommendations for publishing code? (e.g. Git tree is preferred over tarballs)]

Technical projects already identified
There are some specific technical projects already started:

1. Wireless drivers:

Wireless Broadcom driver in tree is not product grade. Google mandates use of chip set, and integrates an out of tree code base for their downstream partners. Broadcom has no incentive to push. Tim proposed to CEWG to fund a project to backport the broadcom (brcm80211) wireless driver to 3.14, for use in the next generation of Android devices. Tim found that SuSE is running a backporting project (luis’ project - uses Cocinelle). This project seems advanced. It would be good to convince product vendors to test this driver to report issues so that the mainline driver can be improved. [Tim: this seems like an action item]

2. USB:

Integration with charger isn’t in mainline. Linaro has posted patches for interface between charger and USB driver. Another issue is that some USB pins (ID, VBUS) are not hooked up to USB controller, so the mainline drivers don’t actually work for OTG switching. extcon seems to be the preferred upstream method to fix this. Sony is working on some enhancements here.

There are institutional barriers to contributing, as well as process issues:

Looking for more technical areas
Looking for help identifying deficiencies. Tim encouraged other users to run the upstream-analysis-tools, so they can see what areas affect newer kernels (3.10 or 3.14)

Trend of mainline status (positive or negative?)
John Stultz: 3.10 vs. 3.4 - amount of out-of-tree code is currently getting worse. Tim: want to definitely re-run stats on later kernels and see if areas are the same. Mark B: Vendors appear to be shipping one kernel version per SoC version. That is, they stay on one kernel version for a particular SoC, and switch when they introduce a new SoC to their customers.

Fork and forward port tends to be the pattern from manufacturers. [Tim: This means they’re carrying a lot of patches from each release to the next. This is true for Sony and I’m sure other product vendors have the same problem.]

Next was discussed different stats for measuring out-of-tree-ness. Someone noted that the raw lines of code might be misleading a misleading stat, due to a single source tree supporting a lot of different processors.

A better stat might be to find the lines that are actually used in the source tree, based on the kernel config for the product, and only diff those. ie. strip a tree to just source code used, and do a diff on it. However, if DT is used, the kernel may contain code for lots of different processors.

Mark B: It’s worth differentiating between just device support, and infrastucture. Solving these requires different approaches. Tim: It seems like many SoCs don’t have low level stuff upstream (things like clocks, regulators, interprocessor communication systems, pinctrl, etc.), and hence, there’s not a good foundation to build on. [Tim: note after the fact - this would be good to actually research to see for each category of low-level support where each SoC is with mainline support] Mark B: Qualcomm tried with NFD. [NFD? Tim: I’m trying to remember what this discussion was] Tim: I hate to admit it, but some drivers in mainline now can’t run, because dependencies are missing. But hopefully they’ll be able to run soon when a bit more infrastructure code is there.

Tim: My hunch is that many SoCs don’t have good low-level interprocessor communication support upstream. rpmsg may not be adequate for communication that these chips need.

Other problem areas
What is biggest problem people are seeing?

John Stoltz - philosophy of mainline: doing it right; Vendors don't have mainlining in the budget. Tim: Obsolescence cycle. Vendors may believe that processor will be out-of-date before the code can get mainlined

Discussion about Free-electrons example:  9 guys make lot of progress - Thomas Petazzoni has a diagram showing value of doing work. Thomas’ point was that because IP blocks get reused over SoC family, the porting work does pay off over time. Some vendors have a different team for each SoC, and thus don’t see long-term cost savings of mainlining.

Another problem is drivers written and maintained in a style that is not mainlinable.

Some out-of-tree drivers are written to vendor framework so they can have one driver that works on multiple OSes (e.g. Windows and Linux). Mainline won’t take these drivers. They have needless abstraction layers. Rationale by vendor is that they can fix bug or add features in one place for multiple OSes.

However, it may actually be better to just maintain the driver separately for each of the OSes. Linux-specific drivers tend to be much smaller and have better performance. Driver that is 1/10th size is less cost to you. That's the community argument - but is there data to support this. There are many anecdotes about shrinking a driver when mainlining it, but no known study of this.

Documenting benefits
Tim: Can this be measured to produce hard numbers to show benefit of Linux-specific/mainline driver? For example, take a sampling of 10 out-of-tree drivers, and then measure size in tree, after mainlining them.

Value is not just lines of code reduction. Also a Linux-specific driver often performs much better. But this would be a hard stat to generalize across driver types.

Next discussed was the cost of bugfixing multi-OS drivers vs. mainline drivers, and the the security risk of out-of-tree code. Brought up the fact that bug fixes have to be applied on all versions of the driver. Vendors assume that one bugfix will cost less if they have a single (multi-OS) driver. However, this may not be the case. Some bugs will only show up on a single OS. Also, you still have to do QA on each OS, for bugs you fix. The QA costs don’t go away with a single driver.

Abstraction in driver adds risk though as well. (It increases the risk surface for bugs) Figuring out may not impact others, etc. Removing the abstraction sometimes fixes bugs the developers didn’t know were there.

Need a cogent argument. Something like: “Linux specific drivers are this much smaller. If you delete this code, and just use the framework, it will fix this bug. Etc.”

[Tim: would be nice to have data on specific bugs fixed when drivers were mainlined.]

Idea - multi-OS drivers are assumed to be easier, because the developers only have to know the single vendor abstraction layer, and not each individual OSes frameworks. However, this may be false. You still might have to know the intricacies of each OS, in order to avoid bugs or interact with the idiosyncrasies for each OS. This may mean it is actually more bug-prone to write multi-OS drivers.

Other obstacles that project could help with
Kernel documentation is the pits. Developers don’t maintain docs very well. Sometimes the initial documentation provided for a framework is good, but it gets stale because people don’t maintain it. The docs are there to help newcomers. [Tim: can also help old-timers who switch between systems.]

More on metrics
Hypothesis Measure: number of patches that a vendor has to do to make it useful for a product. That is, it would be interesting to see if developers have to fix out-of-tree code more than they have to fix mainline code. The metric would be patches applied to out-of-tree drivers compared to patches applied to in-tree drivers.

One example was provided by Sony: Synaptics Touch Screen Driver. It was about a 100K diff. Sony Mobile has lots of patches, but Synaptics wouldn’t take them and obviously they weren’t applicable to mainline. [Tim: Note that one reason was that the release of driver that Sony patched against was not current to Synaptics.] Sony had no place to send patches and had to maintain the patches themselves.

Mark Gross: Why didn’t Sony mainline this driver? Tim: It was on our list, but was lower priority than SoC stuff. Also, the motivation for doing stuff not your IP is real low.

NFC as a problem area
Situation with NFC support in Linux kernel is pretty bad. NFC drivers for android usually consist of driver in userspace talking to a small in-kernel I2C shim. The shim was a 600 line driver, but it was impossible to get it mainlined. The maintainers said that allowing the shim would take away people’s incentive to do the right thing. However, the vendor was now demotivated from doing anything in that area of the kernel.

talk about shaming
How to Incentivize management to worry about upstream. How about a Wall of shame? [Tim: We discussed that at other meetings, and decided a reward rather than punishment would be better. Maybe we need a metric for “good mainline status”?]

Maybe produce a list of upstream supported parts - rather than wall of shame, … Back to wall of shame: Idea: Could focus on vendor’s out of tree items - try to demonstrate that they are bigger security risk. Vendors might try to get their name off of a ‘security risk’ list.

Community is doing a pretty good job of backports.

What about user space closed source software. There is code that should be in the kernel, but people are using kernel helpers with drivers in user space. For now, this is outside scope of this project. Example of this are media drviers. These are mostly in user space driver, as are GPUs,  COMM processors. Note that an open source driver for the the Adreno GPU is coming along nicely.

Tim: Plan is to attack the tractable problems first.

get positive examples instead
Making sure that people use good examples heavily. Successfully use - with handset vendors. It works better. Less hassle.

Illustrate how much cost it actually is…. over a period of time. Useful data for vendors to see.

commit trends by company
We looked at the amount of contributions (commit counts) for various companies. Many companies have made big improvements (e.g. Samsung). Intel has big numbers, but most of the commits are not for mobile SoCs. TI turned up in Jon Corbet’s top 10 list of contributors, but this may have been deceptive, as it may have been a few “key people” rather than an institutional directive. Free Electrons example discussed. They have few people, but a lot of commits.

Tim: We have device tree armageddon coming….. Petazzoni’s slide showing that review is lagging behind DT submissions for review. Mark: It may not be as bad as it looks. Possibly more DT things are coming through that don’t need DT review. [Tim: this implies that DT is stabilizing, which would be great. However, personally I’m not sure there’s enough evidence of this yet.]

In order to get good commit numbers, having key people is the way to do it. Need to be able to widen pool. Mark: Companies having good commit numbers, and getting stuff upstream depends on what you spend your time and effort on. Its about actually caring enough to do it.

Outsourcing to freeelectrons method doesn’t scale. If a company outsources their mainlining, the SoC vendors developers miss out on interaction with mainline. Development teams and product team often don't have time to mainline stuff because of product treadmill.

We looked more at the mobile chipset commits trends. Intel has lost of commits, but a lot of them are not for non-embedded processors. Intel SoCs targeted at the mobile space have some of the same types of things out of tree as ARM processors. (Mark Gross agrees).

Suggestion to use staging
One suggestion: leverage staging more. Short term window while devices in market.

MediaTek - wrote lots of multiOS code, built own frameworks and abstractions.

Tim: Is it worth putting multi-OS drivers in staging if Mediatek is not on board? General consensus of yes.

Tim: Some people might object to focusing forum efforts on a single vendor’s driver (broadcom). Even Broadcom might object, if they prefer to work on their out-of-tree driver.

Broadcom - make it better against their wishes…. Tim said he hadn’t talked to Broadcom. Someone said don’t rule them out. (they might be interested in improving the mainline driver.)

Mark B: Boot bit is really important - massive hurdle overcome. It's important to get a product to just boot to uart console on mainline, then work from that base.