Kernel Debugging Tips

From eLinux.org
Revision as of 15:58, 13 June 2008 by Tim Bird (Talk | contribs)

Jump to: navigation, search

Here are some miscellaneous tips for debugging a kernel:

Accessing the printk buffer after a silent hang on boot

Sometimes, if the kernel hangs early in the boot process, you get no messages on the console before the hang. However, there may still be messages in the printk buffer, which can give you an idea of where the problem is.

The kernel starts putting messages into the printk buffer as soon as it starts. They stay buffered there until the console code has a chance to initialize the console device (often the serial port for embedded devices). Eve though these messages are no printed before the hang, it is still possible in some circumstances to dump the printk buffer and see the messages.

Quinn Jensen writes:

Something I've found handy when the console is silent is to dump the printk buffer from the boot loader. To do it you have to know how your boot loader maps memory compared to the kernel.

Redboot example on a Freescale ADS board

Quinn says: Here's what I do with Redboot on i.MX31:

fgrep printk_buf System.map

this shows the linked address of the printk_buf, e.g.:

c02338f0 b printk_buf.16194

The address "c02338f0" is in kernel virtual, which, in the case of i.MX31 ADS, redboot will have mapped to 0x802338f0. So, after resetting the target board, but without letting it try to boot again, at the redboot prompt,

dump -b 0x802338f0 -l 10000

And you see the printk buffer that never got flushed to the UART. Knowing what's there can be very useful in debugging your console.

U-boot example on an OMAP OSK board

Tim Bird tried these steps and they worked:

grep __log_buf System.map

or

grep __log_buf /proc/kallsyms

These show:

c0352d88 B __log_buf

In the case of the OSK, this address maps to 0x10352d88. So I reset the target board and use the following:

OMAP5912 OSK # md 10352d88
10352d88: 4c3e353c 78756e69 72657620 6e6f6973    <5>Linux version
10352d98: 362e3220 2e32322e 612d3631 6e5f706c     2.6.22.16-alp_n
10352da8: 7428206c 64726962 6d697440 6b736564    l (tbird@timdesk
10352db8: 2e6d612e 796e6f73 6d6f632e 67282029    .am.sony.com) (g
10352dc8: 76206363 69737265 33206e6f 342e342e    cc version 3.4.4
10352dd8: 34232029 45525020 54504d45 65755420    ) #4 PREEMPT Tue
...

Debugging very early boot problems

If the kernel fails before the serial console is enabled, you can use CONFIG_DEBUG_LL to change the way the printk code outputs characters.

Here is an e-mail exchange seen on the linux-embedded mailing list (with answer by George Davis):

> When we try to boot a 2.6.21 kernel after uncompressing the kernel the boot process dies somehow.
> We've figured out so far that the kernel dies somewhere between  the gunzip and start_kernel.

Try enabling DEBUG_LL to see if it's an machine ID error.  If you see:

Error: unrecognized/unsupported processor variant.

or:

Error: unrecognized/unsupported machine ID...

Then you either don't have proper processor support enabled for your target
or your bootloader is passing in the wrong machine number.

If you still don't see anything, try hacking printk.c to call
printascii() (enabled for the DEBUG_LL case) to print directly to the
serial port w/o a driver, etc.,.  You can find more details on these
low-level debugging hacks via a little googling...