Kernel Debugging Tips
Here are some miscellaneous tips for debugging a kernel:
Debugging early boot problems
Accessing the printk buffer after a silent hang on boot
Sometimes, if the kernel hangs early in the boot process, you get no messages on the console before the hang. However, there may still be messages in the printk buffer, which can give you an idea of where the problem is.
The kernel starts putting messages into the printk buffer as soon as it starts. They stay buffered there until the console code has a chance to initialize the console device (often the serial port for embedded devices). Eve though these messages are no printed before the hang, it is still possible in some circumstances to dump the printk buffer and see the messages.
Quinn Jensen writes:
Something I've found handy when the console is silent is to dump the printk buffer from the boot loader. To do it you have to know how your boot loader maps memory compared to the kernel.
Redboot example on a Freescale ADS board
Quinn says: Here's what I do with Redboot on i.MX31:
fgrep printk_buf System.map
this shows the linked address of the printk_buf, e.g.:
c02338f0 b printk_buf.16194
The address "c02338f0" is in kernel virtual, which, in the case of i.MX31 ADS, redboot will have mapped to 0x802338f0. So, after resetting the target board, but without letting it try to boot again, at the redboot prompt,
dump -b 0x802338f0 -l 10000
And you see the printk buffer that never got flushed to the UART. Knowing what's there can be very useful in debugging your console.
U-boot example on an OMAP OSK board
Tim Bird tried these steps and they worked:
grep __log_buf System.map
grep __log_buf /proc/kallsyms
c0352d88 B __log_buf
In the case of the OSK, this address maps to 0x10352d88. So I reset the target board and use the following:
OMAP5912 OSK # md 10352d88 10352d88: 4c3e353c 78756e69 72657620 6e6f6973 <5>Linux version 10352d98: 362e3220 2e32322e 612d3631 6e5f706c 184.108.40.206-alp_n 10352da8: 7428206c 64726962 6d697440 6b736564 l (tbird@timdesk 10352db8: 2e6d612e 796e6f73 6d6f632e 67282029 .am.sony.com) (g 10352dc8: 76206363 69737265 33206e6f 342e342e cc version 3.4.4 10352dd8: 34232029 45525020 54504d45 65755420 ) #4 PREEMPT Tue ...
Using CONFIG_DEBUG_LL and printascii()
If the kernel fails before the serial console is enabled, you can use CONFIG_DEBUG_LL to change the way the printk code outputs characters.
Here is an e-mail exchange seen on the linux-embedded mailing list (with answer by George Davis):
> When we try to boot a 2.6.21 kernel after uncompressing the kernel the boot process dies somehow. > We've figured out so far that the kernel dies somewhere between the gunzip and start_kernel. Try enabling DEBUG_LL to see if it's an machine ID error. If you see: Error: unrecognized/unsupported processor variant. or: Error: unrecognized/unsupported machine ID... Then you either don't have proper processor support enabled for your target or your bootloader is passing in the wrong machine number. If you still don't see anything, try hacking printk.c to call printascii() (enabled for the DEBUG_LL case) to print directly to the serial port w/o a driver, etc.,. You can find more details on these low-level debugging hacks via a little googling...
Triggering a kernel event
overloading the sync system call
Sometimes, it is nice to trigger an event to happen in the kernel from user space. Instead of creating infrastructure to handle a /proc event, an ioctl() or making a new syscall, it can be quick and easy to just overload an existing function. One function not used very often is sync. (I have found that the sync system call is not normally called by user space programs (or during standard linux booting).
It is quite easy to put a hook to your own kernel program in the sys_sync() routine (located in fs/sync.c) and cause it to execute by issuing 'sync' from the shell command line. This is handy as a temporary mechanism to test things that you have put in the kernel.