Describe Notes On Git Bisect here.
Here are some messages from Linus about using "git bisect"
Message 1 - 10/06/06
Subject: Re: Simple script that locks up my box with recent kernels
> The first kernel where I know for sure it caused lockups is > > 2.6.18-git15 . I've also tested 2.6.18-git16, 2.6.18-git21 and > > 2.6.19-rc1-git2 and those 3 also lock up solid.
Can I bother you to just bisect it?
Even if you decide that it's too painful to bisect to the very end, "git bisect" will give great results after just as few reboots as four or five, and hopefully narrow down the thing a _lot_.
So, for example, while my git tree doesn't contain the stable release numbers, you can trivially just get my tree, and then point "git fetch" at the stable git tree and get v184.108.40.206 that way.
Then you can do just
git bisect start git bisect good v220.127.116.11 git bisect bad $(cat patch-2.6.18-git15.id)
and off you go - it will pick a half-way point for you to test, and then if that one was good, you just say "git bisect good", and it will pick the next one..
(that "patch-2.6.18-git15.id" thing is from kernel.org - it's how you can get the exact git state of any particular snapshot, even if it's not tagged in any real tree - that particular one seems to have SHA1 ID 1bdfd554be94def718323659173517c5d4a69d25..)
"git bisect" really does kick ass. Don't worry if it says "10374 commits to test after this" - because it does a binary search, it basically cuts the commits to test in half each time, and so if you do just five bisections, you'll have cut down the 10,000 commits to just a few hundred. At that point, maybe we even have a clue, or we might ask you to test a few more times to narrow things down even more.
message 2 - 10/07/06
On Sat, 7 Oct 2006, Jesper Juhl wrote:
> > >> > > Can I bother you to just bisect it? > > > > Sure, but it will take a little while since building + booting + > > starting the test + waiting for the lockup takes a fair bit of time > > for each kernel
Sure. That said, we've tried to narrow down things that took hours or days (under real loads, not some nice test-script) to reproduce, and while it doesn't always work, the real problem tends to be if the problem case isn't really reproducible. It sounds like yours is pretty clear-cut, and that will make things much easier.
> > and also due to the fact that my git skills are pretty > > limited, but I'll figure it out (need to improve those git skills > > anyway) :-)
"git bisect" in particular isn't that hard to use, and it will really do a lot of heavy lifting for you.
Although since it will just select a random commit (well, it's not "random": it's strictly as half-way as it can possibly be, but it's automated without any regard for anything else), you can sometimes hit a situation where git will ask you to test a kernel that simply doesn't work at all, and you can't even test whether it reproduces your particular bug or not.
For example, "git bisect" might pick a kernel that just doesn't compile, because of some stupid bug that was fixed almost immediately afterwards. In those cases, the total automation of "git bisect" ends up being something that has to be helped along by hand, and then it definitely helps to know more about how git works.
Anyway, the quick tutorial about "git bisect" is that once you've given it the required first "good" and "bad" points, it will create a new branch in the repository (called "bisect", in case you care), and after that point it will do a search in the commit DAG (aka "history tree" - it's not a tree, it's a DAG, since merges will join branches together) for the next commit that will neatly "split" the DAG into two equal pieces. It will keep splitting the commit history until you get fed up, or until it has pinpointed the single commit that caused the problem.
The nicest tool to use during bisection is to just do a
git bisect visualize
that simply starts up "gitk" (the default git history visualizer) to show what the current state of bisection is. Now, if there are thousands and thousands of commits, you'll have a really hard time getting a visual clue about what is going on, but especially once you get to a smaller set of commits, it's very useful indeed.
And it's _especially_ useful if you hit one of the problem spots where you can't test the resulting tree for some unrelated reason. When that happens, you should _not_ mark the problematic commit as being "bad", because you really don't know - the "badness" of that commit is probably not related to the "badness" that you're actually searching for.
Instead, you should say "ok, I refuse to test this commit at all, because it's got other problems, and I will select another commit instead". The bisection algorithm doesn't care which commit you pick, as long as it's within the set of "unknown" commits that you'll see with the visualization tool.
Of course, for efficiency reasons, the _closer_ you get to the half-way mark, the better. So it's useful to try to pick a commit that is close to the one that "git bisect" originally chose for you, but that's not a correctness issue, that's just an issue of "if we have a thousand potential commits, we're better off bisecting it 400/600 rather than 1/999, even if the exact half-way point isn't testable".
So if you need to decide to pick another point than the one "git bisect" chose for you automatically, just select that commit in the visualizer (which will cut the SHA1 name of it), and then do
git reset --hard <paste-sha1-here"
to reset the "bisect" branch to that point instead. And then compile and test that kernel instead (and then if that's good or bad, you can do the "git bisect good" or "git bisect bad" thing to mark it so, and git will continue to bisect the set of commits).
It can be a bit boring, but damn, it's effective. I've used "git bisect" several times when I've been too lazy to try to really think about what is going on - I'll happily brute-force bug-finding even if it might take a little longer, if it's guaranteed to find it (and if the bug is reproducible, git bisect definitely guarantees to find what made it appear, even if that may not necessarily be the deeper _cause_ of the bug)