Session:The Kernel Report ELC 2012

Session Details

 * Event : ELC 2012
 * Date : February 15, 2012
 * Presenter : Jonathan Corbet
 * Organization: LWN.net
 * Slides : The Kernel Report ELC 2012
 * Video : here (linux foundation) and here (free-electrons)
 * Duration : 58 minutes

Abstract
The Linux kernel is at the core of any Linux system; the performance and capabilities of the kernel will, in the end, place an upper bound on what the system as a whole can do. This talk will review recent events in the kernel development community, discuss the current state of the kernel and the challenges it faces, and look forward to how the kernel may address those challenges. Attendees of any technical ability should gain a better understanding of how the kernel got to its current state and what can be expected in the near future.

Biography
Jonathan Corbet got his first look at the BSD Unix source back in 1981, when an instructor at the University of Colorado let him "fix" the paging algorithm. He has been digging around inside every system he could get his hands on ever since, working on drivers for VAX, Sun, Ardent, and x86 systems on the way. He got his first Linux system in 1993, and has never looked back. Mr. Corbet is currently the co-founder and executive editor of Linux Weekly News; he lives in Boulder, Colorado with his wife and two children.

Transcript

 * Transcribed by: Chris Dudding

0:00 - 1:00:

[ELC Slide - Thank you to our sponsors]

>> INTRODUCER: So, I'd like to welcome you all out to this year's Embedded Linux conference. Very happy to see you all here and I hope you're as anxious as I am.

We're.. well erm. anxious is not the right word but excited as I am about the great sessions we've got planned for this week. Um..

This is always erm.. the best part of getting ready for a conference is when you actually get the thing under way and there's been a lot of prep work behind the scenes and we're very excited to have you here and excited about the programme we've got.

[ELC Slide - Mobile App]

I've got just a couple of quick announcements I'd like to make:

First, about the mobile application so in the guide it talks about a mobile app um.. the actual name you look for in the marketplace is wrong in the guide

1:00 - 2:00:

You need to look under Linux Foundation conference. And that's available on the Android Market. There's actually a kind of a funny story why its not available for iPhone and that was because when it was first submitted to the iPhone erm. I don't know what they call their..

>> AUDIENCE: App Store

[Laughter] Well, so. Yes, thank you. It actually it was for.. It had Android listed in the description because it covers both conferences: Android Developers.. [rephrases] Android Builders Submit and ELC and they rejected it because of the word Android. So, go figure!

So, not for lack of trying. I'm sorry we don't have an iPhone app for you. Get yourself a more open phone! [Laughter]

[ELC Slide - Intel Atom Processor Giveaway]

Anyway, also I'd like to talk about this. So this um.. [holds Intel Atom development board] Ignore the antennas. Inside is a little development board

2:00 - 3:00:

that is being manufactured by Intel and it contains an Atom processor. Its actually called the.. I love Intel code names.. the board is called a Fish River Island 2 and its erm. E6XX Tunnel Creek Processor and they will be manufacturing these.. these are not available yet. They will be manufacturing 'em and sending them out to attendees. If you are interested in getting one.. we have.. unfortunately we don't have enough for everyone but what they are doing is they are taking proposals you can register at the Intel desk just outside in the lobby. And kind of just give an idea what you would plan to do with the board and the first 120 good ideas will get one shipped to you sometime in April or May. So, that's pretty nice. Let's thank Intel for that. [Claps] So that's pretty nice. Um.. So that's pretty cool

3:00 - 4:00:

Nice little development board for you [inaudible comment from audience] [laughs] No, too low.

and let's see I want to mention the YOCTO reception, hosted by YOCTO and Intel is reception tonight. Should be a lot of fun. Its over at the hiller aviation museum and there's a little.. cute little boarding pass thing you got talking about that also we've got little stickers done by jambe that you can get if you want to proudly proclaim that you came to ELC

and the last thing to do is to introduce our keynote speaker for this morning. So, I've known Jon Corbet for a number of years. He's kind of.. I don't know if this is the right term because he has no beard.. he's one of the grey beards in the Linux industry and an incredible incredible asset

4:00 - 5:00:

At these events it is very customary for us to put in a plug for LWN.net and this event is no exception. In my opinion one of the premier sources for information about Linux, the Linux kernel and the industry and open source.

If you are not a subscribing member of LWN.net, you should be one.. shame on you.. because this is an asset. Really a community asset that we should support and Jon very graciously accepted to give our kernel report and he'll tell us all about what's going on with the kernel in the last little bit and maybe a little bit about what's coming up in the future

So without further ado, let me introduce Jon Corbet.

5:00 - 6:00: [Slide 1 - The kernel report]

>> JON CORBET: Hi. Thanks a lot. Good morning everybody. How many of you have seen me give one of these talks before? [Laughter] A fair number.

[Slide 2 - The plan]

Well you'll be glad to hear I've reorganised it.

The plan remains the same, which is to look back over a years worth of kernel developments with an eye towards what's going on in the future. I've changed the way things are done. Hopefully it will work out well.

[Slide 3 - Starting off 2011]

We're actually going to start just over a year ago with.. at the beginning of 2011 we saw the release of the 2.6.37 kernel.

What better to start a new year than with a new kernel? This was a fairly big release with well over 11,000 changesets in it.

This kernel brought in the first of a set of scalability patches for the virtual filesystem layer which added a fair amount to the complexity of that layer but also if you had the right kind of workload brought about something of the order of 30% performance improvement if you were doing lots of opens and closes that sort of thing. So that was good to have.

Block I/O bandwidth controller.. its actually a second bandwidth controller working at a higher level in the I/O scheduler stack allowing

6:00 - 7:00:

the placement of absolute limits on block I/O bandwidth

Finally got some support for the point to point tunnelling protocol in the mainline kernel.

Basic support for the parallel NFS protocol and

Wakeup sources which is an interesting.. on its own wakeup sources is just an accounting mechanism for tracking devices in the system that can wake the system from a sleep state but its part of a bigger effort to replicate the android opportunistic suspend mechanism and provide an implementation of that mechanism that works well within the mainline kernel. So we are still seeing pieces of that going in and pieces of it under discussion but at some point we may have a solution for that

So that was 2.6.37 - a lot went in there.

[Slide 4 - What have we done since then?]

And a lot has happened since then. So what has gone on since 2.6.37?

We've made 5 more kernel releases. I can call them out as i come to them going on the year.

We've merged almost 60,000 change sets it was over the course of the last year. These have come from over 3000 developers and at this point we have over 400 companies that we can identify that have contributed to the kernel

7:00 - 8:00:

So, I've put up numbers like this before. We've seen them before.

We know at this point that the kernel is a very active, very fast moving project. Perhaps the biggest on the planet, hard to say and it continues to move on and it shows no real signs of slowing down

[Slide 5- February]

So February of 2011

[Slide 6 - Greg K-H quote]

One of the things that happened early on in february was a little note from Greg Kroah-Hartman congratulating ralink saying as you can see ralink has stopped dumping drivers on us and instead is now working on patching the driver that we already have in the upstream kernel and trying to make that driver support their new hardware this he said shows a huge willingness to learn how to work with the kernel community and they need to be praised for this change in attitude

This is..i mean its a nice note but there is nothing all that special in a way its something that we go through with a lot of companies they take a little while to figure out how to work with us and then they do

8:00 - 9:00:

and so we see a lot of progress as companies figure out how does mainline kernel process work, how can we get our code in there, why is it in our interest to do so and then they figure how to do it and they become part of the machine. And we see this happening over and over again

[Slide 7 - Employer Contributions]

So this seems a good as time as any to put up this slide. This is a variant of a slide that i've been putting up for a while showing the top contributors to the kernel over the course of the last year from the beginning of the 2.6.38 development cycle through the 3.2 release

So, we see as always, volunteers top of the list at just under 14%. The percentage of changes coming in from people working on their own time has actually slowly fallen over the years and why that is is hard to say. One could take a pessimistic view and say that the kernel is getting too big and too complex, the easy projects are done and so we are putting off our volunteers that way. On the other hand, one could look at this and one could say well anybody who has shown any ability to actually get code into the kernel in any kind of reliable way

9:00 - 10:00:

tends not to stay a volunteer for very long unless they really really want to because they tend to get buried in job offers.

so that of course is not going to be a bad thing.

other than that we see a lot of the same companies that we've been seeing for quite a long time

we can see companies that are not only competing fiercely in other areas of the market but in fact at this point they all seem to be suing each other elsewhere [Laughter] but they are still working together quite well at this level and um.. the situation hasn't changed a whole lot

but here's one change I want to call out

this is.. was.. alright.. we'll do it the old fashioned way. no we won't..

[Slide 9 - Kernel changeset contributions by employer]

speak to me.. come on.. um.. alright. well. this is the one i was going to get to eventually. um.. what is going on here.. alright.. we'll get there

10:00 - 11:00:

never done that before

This is just a plot of the percentage contributions from the top companies. The same companies we saw on the last slide and we see.. in a noisy sort of way.. we have horizontal lines. Right? The companies who were contributing back around 2.6.20 which is when this slide starts are still contributing heavily now.

The big change I want to call out now is those two heavy lines at the bottom. Those two lines correspond to TI and Samsung. Right? And they correspond to a trend we seeing in general which is a much increased of level of involvement of the mobile and embedded community as mobile Linux grows in importance and grows in deployments these companies are figuring out that they need to work with the mainline and they are doing so.

So we're seeing this strong upward trend in contributions from that part of the market. I think we will continue to see that for quite some time

[Slide 10 - Also in February]

One other thing that happened in February, just thought I'd point out quickly,

11:00 - 12:00:

This didn't actually happen then but this when it came out and became well known. Red Hat for their enterprise kernel stopped shipping individual patches with their kernel and now you just get one big blog with all their changes. All 7699 changes in their current kernel kind of mixed in together into a single thing. So its become much harder for people who want to look in there and see what are the individual changes that Red Hat has actually made to the kernel. And why have they done that. And that's kind of a sad statement on the nature of the competition at that level of the marketplace. And there's not much to be done for it. It is of course entirely compliant with the licensing and so on but its not something that our community has really welcomed

[Slide 11 - March]

[Slide 12 - 2.6.38]

Alright.. Let's move on to March.. Well. Another month. Another kernel release. That's when 2.6.38 came out

A couple of very interesting patches that came out then. Including per-session group scheduling. This is the famous 200 line kernel patch that people were talking about for a while that turned out into about 800 line kernel patch by the time it actually went into the mainline

12:00 - 13:00:

that allows the kernel to employ the group scheduling mechanism automatically and partition processes into different groups and schedule them.. schedule the groups against each other this can give you much nicer interactivity in desktop situations. Its useful in a lot of other situations as well, to keep different groups of processes from interfering with each other in the scheduler and the nice thing about this particular patch is not that it added this feature because we've had this feature for years. It sort of made it just work. There's a whole lot of nice things that we have in the kernel that people don't really make a whole lot of use of because you have to do fiddly things with them to make them work. If you can make them simply work for people then things are a whole lot better

Transparent huge pages, which is also on the list, is another example of that. The huge page mechanism in most processors allows the memory management unit to work with much larger pages. You get a nice performance improvement out of that for just about any workload you can think of because you save a lot of pressure on your translation look aside buffer and such.

Transparent huge pages used to require you set up huge TLBFS to the side

13:00 - 14:00:

make changes to your application to make use of it and so on. So there weren't too many people actually using it. Now we're all using it because it simply works in the system and we all get the benefits from that

So we had various other things. Including the other half of the virtual file system scalability patch set I mentioned before. Transmit packet steering which is a networking scalability change that came out of Google and improvements to the block i/o bandwidth controller and of course a whole bunch of other stuff but its those two patches i think that were really the significant features of 2.6.38

[Slide 13 - Linus quote]

Something else that happened in March was that Linus threw a temper tantrum. This is not necessarily all that surprising, it happens but this was a fairly big one

He got a bunch of pull requests from ARM community and he decided that he was really just fed up with them so he stopped actually pulling them and said "ok somebody needs to get a grip in the ARM community". So, what's going on here

[Slide 14 - What is the "ARM problem"?]

14:00 - 15:00:

I probably don't need to tell the people in this room that the ARM architecture is interesting, in that it doesn't really come with the platform defintion the way some other architectures do. Every ARM system is unique in its own very special way. And so, there's a lot less commonality in terms of the code needed to support those systems

When i say the "Embedded" mindset what i mean is.. we have a lot of people working on their own special little projects working under intense time pressures and so on and so they've tended to solve their specific problem and if they put code upstream at all they've worked in their own piece of the tree and not really thought about making the kernel better for everybody involved.. so we've ended up with.. um.. with a lot of stuff.. there's really been nobody being overseeing this tree to try to make this better. so we get a lot of little sub-trees with a lot of duplicated code and a fairly big ugly mess has resulted in that tree

15:00 - 16:00:

[Slide 16 - Why is this happening]

Another way of putting this is that for years we've been asking the embedded community to contribute back to the kernel. Please please give us your code back. And the fact is that they are now doing that. Right? What we have here is a problem that is a result of our own success. We've gotten what we wished for. You should always be careful what you wish for. But in fact what is happening is really a good thing. The only real problem is that we haven't developed the processes within the community to manage all of this stuff

[Slide 17 - Cleaning up the mess]

So, what's doing.. what's happening? We're trying to clean things up. More high level oversight now. Arnd is running a tree that funnels a lot of the system on chip stuff into the kernel and so on. Trying to keep a better grip on it. There's a lot of cleanup going on at various levels. Consolidating code that was duplicated across all the various different ARM sub architectures and a move towards the device tree mechanism that can hopefully someday help us get rid of these board files and such

16:00 - 17:00:

and maybe someday i doubt we'll ever have a single kernel that runs on every ARM system but we can maybe have a kernel that runs a lot of the more popular ones without having to build it specifically for whatever system you're working on at any given time. we're heading in that direction. things are getting a whole lot better

And ARM is not only becoming a proper first tier architecture its actually starting to drive things in interesting ways. If you look on LWN today we've just posted an article from an ARM developer about a concept called big dot little. We've got system on chips now that have two different classes of ARM processors on them. There's the big fast power hungry one and the little slow power efficient one and you try to figure out which one of those you should be using for any given workload at any given time. This is going to stress the scheduler in very interesting ways. The scheduler has not been written with that kind of stuff in mind so we now have stuff coming up from the mobile embedded world that is going to be driving the future directions of the core kernel and its going to be very interesting to watch

17:00 - 18:00:

[Slide 18 - April]

Moving on to April. That's what April looks like in Utah. Nice place although the network connectivity is kind of poor, but you've got to take it.

[Slide 19 - Native Linux KVM Tool]

So, we had an interesting fight that came about in April. As a result of the posting a thing called the native linux KVM tool. This is just a small replacement for QMU. Its a hardware emulator aimed mostly at kernel developers don't want to mess around with a big full functional system

The sticking point is that they wanted to merge this code (which is user-space code) into the kernel source tree.

[Slide 20 - Ingo Molnar quote]

So, we've had people actually pushing this idea pretty hard. This is Ingo Molnar who says "some day somebody is going to integrate into a single tree and make the killer distribution that pushes everything else aside". So, not everybody agrees with this needless to say. This is an idea that is being pushed.

[Slide 21 - User-space code in the kernel tree?]

The people that push the idea of putting user space into the kernel tree say a number of advantages

18:00 - 19:00:

The code become whole lot more visible because you can patch it and you end up in the kernel change log. Tend to get more developers. They say you can develop kernel ABI and users together so that the two can develop into a much.. sort of.. homogenous whole. It encourages developers to think across this kernel user-space boundary which is a pretty hard boundary that people usually lurk on one side of or the other. The idea is you are supposed to get better integration across the whole system if you do this

[Slide 22 - User-space code in the kernel tree?]

On the other hand, people complain that it makes the kernel tree bigger. The kernel tree.. source tree.. is not all that small now. There are claims that you get ABI stability problems because if you can change user space ABI and its primary user at same time because you've got them all together in the same tree you can in theory create stability problems for other users of that ABI. This is not suppposed to happen. We are not supposed to do that in the kernel but there are people who claim this does happen

19:00 - 20:00:

They claim that other out of tree projects are disadvantaged because they just don't get the same visibility and they ask where does it end? Do we put our desktop environment into the kernel tree? Do we throw libre office in there? You never quite know what's going to go on there. So this is a fight that going to go on for a while. these people not hoing to give up on this. Linus seems to be holding the line about pulling more user space stuff in for now. but i don't think we're done with that discussion

[Slide 23 - April]

one other thing. sorry mark wherever you are.. but um. but this is a quote that really stuck with me at the time. that for all the progress that we have made about getting companies to work with the development process. We still have pockets of resistance where they say we need to work with proprietary drivers and binary only code

20:00 - 21:00:

and this is something that we have been fighting for years. in the end it seems like we almost always win and that companies figure out that it is not in their own interest to ship their code in this way but it continues to happen and we're seeing it certaintly in the mobile area with graphics drivers and some others. I hope that once again we can get past this and get to where we have truely free platforms that we can work on from top to bottom

[Slide 24 - May]

Alright. Moving on to May

One thing that happened in May was the posting of a mechanism called secure computing or an enhancement to this. the idea here was to add a feature where a process could install a install a bitmap in the kernel saying only the system calls that are indicated in this bitmap will be accessible to me from here on out. It was meant for use with the Chromium browser to as part of their sandboxing scheme.. just trying to reduce the attack surface of the kernel.. to take a whole bunch of system calls they know they won't use and take them off limits so they cannot be accessed at all

21:00 - 22:00:

so people look at this and they say "ok, cool.. but how about we enhance this a bit we can add a filtering mechanism we maybe hook it into the tracepoints mechanism because we've got all these points in the kernel"

at which point the tracepoint people jump in and say "hey wait a minute you're not doing that to my tracepoints" and a big fight results and the end result after the dust is settled is that nothing gets merged and the developer goes away

so one can point at this and can say it shows the kernel development process at its absolute worse. right?

we had this big fight and we had a promising young developer trying to do cool stuff got discouraged because people tried to throw their own irons in the fire and nothing happened. on the other hand, last month or so he came back with a completely reworked version of this patch using the berkeley packet filter which works a whole lot better much nicer design and it looks like it probably will go into the kernel perhaps as soon as 3.4

so one can say in fact this shows we did things right. we push back of things until we something that suits us..

22:00 - 23:00:

that solves the problem correctly and we wait until we get the right solution in there so you know you can interpret it either way

[Slide 26 - Yet another kernel release]

also we had yet another kernel release in May when 2.6.39 came out.

various interesting sorts of additions including directed yield which is a virtualisation optimisation, ipset mechanism for managing a large number of ip addresses and the firewalling mechanism, transcendent memory is an interesting approach to memory management used again mostly in the xen area.. the core went in at this point not the users of it.. user namespaces are part of the solution to the containers problem. containers can be thought of as a form of lightweight virtualisation where you wall off a group of processes but they are all running on the host kernel, right? you're not running on a virtual machine, you're running on the host kernel. this is actually a much harder problem to solve than um. full virtualisation is

23:00 - 24:00:

they've been working on it for many years and they will continue to be for a little while yet but we got another one of the pieces in there

the media controller is a response to hardware complexity. we've now got.. in this case.. video acquisitation devices, web cam devices, or your phone camera devices that now have multiple hardware components. you are trying to patch the data between them in various sorts of ways. an exported interface to user space that actually allows this to be used. this is complex stuff and i don't think we've solved this problem very well yet but we're trying to do our best

[Slide 27 - Big Kernel Lock]

one other nice thing that happened.. in May

[Laughter] [Clapping]

only took us 15 years. thanks to Arnd who really carried that across the finishing line although a whole lot of people worked on it to get it there. given enough time we can solve just about any problem we have to

so we got 2.6.39 out

24:00 - 25:00:

[Slide 28 - During the 2.6.40 merge window]

and then linus started hearing voices.. right after that.. during the merge window. saying the numbers are getting too big and maybe he's going to call this thing 2.8.0. because when the voices speak, he listens

so of course when you say something like that, one of the first things that going to happen is you going to get more voices.

[Slide 29 - During the 2.6.40 merge window] so greg kroah-hartman comes out and says well if you do this i'm going to buy you a bottle of whatever whisky you want

well it must have worked because, as we all now know, linus did this

he didn't called it 2.8.0 he called it 3.0. this was, as i think pretty much everybody knows at this point, just a numbering change. there was nothing particularly special about the 3.0 kernel that merited a major number change.. just we were getting really really tired of these big version numbers. so we went on from there

[Slide 30 - June]

so the week after that greg did in fact present him with a bottle of whisky on stage.. with cups. he didn't open it.. although what happened later that evening.. was.. um.. not necessarily well advised

[Laughter]

25:00 - 26:00:

and so we now have the 3.x kernel series instead of 2.6 and everybody is happy ever after until we get to 3 dot something big

[Slide 31 - Ext4 snapshots posted]

alright.. in june we saw another posting of an interesting feature for the ext4 filesystem.. being the snapshotting feature

snapshots of course are a nice feature. you save a copy of the state of the filesystem at some particular point. you can use it for things like rolling back from a failed system upgrade, you can use it to create a [...] version of the filesystem for backups, you could also actually make use of a feature like this in an embedded device for a factory reset kind of functionality where you simply have a snapshot being the initial factory state of the device and all you've got to do is go back to that snapshot and you've done your reset

nice feature. everybody likes it

[Slide 32 - Quote from Josef Bacik]

but we still had some complaints about it. this [...] is from from Josef Bacik who is a btrfs developer

and he's saying why is it that we shoehorning these big features into ext4 which is supposed to be a stable filesystem right?

26:00 - 27:00:

meanwhile we've got btrfs that already does this all this stuff and we're trying to stabilise it

other people have asked very similar questions, why are we doing stuff like this to ext4 when ext4 was meant to be our stopgap until the real next generation filesystems

[Slide 33 - What's up with ext4?]

well, as it happens, there is an awful lot going on with ext4

with 3.2 we saw the addition of a significant new feature called bigalloc allows it to finally move away from allocating blocks in 4kb chunks and you can allocate much larger chunks if you've got a file sysem mostly holding large files then what you end up with is much more efficient operations on those files this is useful within in google for example where they're trying to do this where a lot of the driving force came from and other places as well. a significant speed up. various other things in the works, the snapshot feature that i mentioned. inline data which is an optimisation for very small files

27:00 - 28:00:

secure erase.. if you delete a file it actually goes away from the media completely. checksumming and metadata.. trying to defend against corruptions on the media and so on

[Slide 34 - In other words]

so what's going on is that ext 4 rather than sitting there and gathering dust and fading away looks to develop and grow for some time yet because there are people adopting it and using it and who want to do other stuff with it so as long as people keep hacking on it and adding new features that stuff is going to go in i don't think ext4 going to just kind of fade away anytime real soon

[Slide 35 - UEFI secure boot]

also in june people started talking about this feature called UEFI secure boot. the objective of this feature of course is to create hardware that will only give control of the system to a trusted boot loader this is a boot loader that has been signed by a key that is known to the system level bios. its actually a useful feature you can use it to thwart attacks on the BIOS or the bootloader of the system. very low level root kind stuff

28:00 - 29:00:

29:00 - 30:00:

30:00 - 31:00:

31:00 - 32:00:

32:00 - 33:00:

33:00 - 34:00:

34:00 - 35:00:

35:00 - 36:00:

36:00 - 37:00:

37:00 - 38:00:

38:00 - 39:00:

39:00 - 40:00:

40:00 - 41:00:

41:00 - 42:00:

42:00 - 43:00:

43:00 - 44:00:

44:00 - 45:00:

45:00 - 46:00:

46:00 - 47:00:

47:00 - 48:00:

48:00 - 49:00:

49:00 - 50:00:

50:00 - 51:00:

51:00 - 52:00:

52:00 - 53:00:

53:00 - 54:00:

54:00 - 55:00:

55:00 - 56:00:

56:00 - 57:00:

57:00 - 58:00:

58:00 - 59:00:

59:00 - 60:00: