Today I was testing the cdrom device driver and suddenly my machine hung, couldn’t access it anymore by my ssh connection and even from the serial console (running with serial over lan, a power 5 box connected by the HMC). I could only ping the machine, so the machine could probably give me more information on where happened the hang.
While debugging the linux kernel, you can use the famous Magic SysRq key combination to talk directly with the kernel, getting lots of information from it, and even giving it commands, like to kill processes and reboot the machine.
So, first, how do you enable the Magic SysRq in your kernel? It’s easy, just compile it with CONFIG_MAGIC_SYSRQ=y and after booting your kernel, just enable it at /proc, like (0 is disabled and 1 is enabled):
evalap ~ # echo 1 > /proc/sys/kernel/sysrq
Once you enable it, you can give any SysRq combination to get what you want.
Here’s some useful commands (taken from Documentation/sysrq.txt):
- h – shows you a short help, showing all the commands
- b – reboots your machine (without syncing your disks)
- i – kills all processes, except init
- e – send SIGTERM to all process, except init
- m – will dump the current memory info
- s – sync all disks
- t – will print the task list
- f – calls the oom_kill to kill a memory hog process
- 0 – 9 – sets the console log level
Ok, now how do you use the commands? It’s easy, but depends on your machine and how you’re accessing it, like (got from sysrq.txt again):
- On x86 – hold down Alt + Sysrq (or the Print Screen) + <command>
- On Power PC – Alt + Print Screen (or F13) + <command>
- On Power PC, but using the Serial Over Lan – Crtl + o + <command>
This could help you a lot when debugging the kernel, but can also help you when your machine hang, because you can kill all processes, see what is happening at the task list, sync your disks, kill the process that’s taking your memory or reboot it 🙂