rsyslogd stuck at eating 100% (or more) CPU after upgrading to Ubuntu Natty Narwhal

This might also happen after upgrading to maverick, so don’t ignore the explanation even if you’re a version or two behind (.. or reading this at a much later time and we’ve all switched to implants).

Apparently the reason for rsyslogd getting stuck is a mismatch between how the kernel provides access to rsyslogd and what rsyslogd expects. If rsyslogd fails to get access to elements in the proc file system (/proc/kmsg was suggested in a bug thread), it locks up and spews out error messages at a great rate.

From /var/log/syslog

Apr 29 08:04:08 ubuntu kernel: Cannot read proc file system: 1 - Operation not permitted.
Apr 29 08:05:08 ubuntu kernel: last message repeated 13208405 times
Apr 29 08:06:08 ubuntu kernel: last message repeated 13297682 times
Apr 29 08:07:08 ubuntu kernel: last message repeated 14241325 times
Apr 29 08:08:09 ubuntu kernel: last message repeated 14397034 times
Apr 29 08:08:43 ubuntu kernel: last message repeated 7302035 times

Yes, that’s about 62 million error messages in less than 5 minutes. This demands quite a bit of CPU.

The reason for this is that the kernel API changed somewhere between the current Ubuntu version (2.6.38 in Natty) (and possibly the one in Maverick) and the one I was running (2.6.31). When rsyslogd runs under the latter, everything goes haywire. The solution is to make sure your kernel is upgrade to the most recent version – and that you’re actually running it.

First, stop rsyslogd to make your system a bit more responsive again:

sudo service rsyslogd stop

Updating Ubuntu should already have installed the newest kernel versions, but you might have told Ubuntu to use the existing configuration file instead of overwriting it when you updated (I almost do that automagically, which left me a couple of kernel versions behind). You can re-run this process and get grub to use an updated kernel version:

sudo update-grub

This might ask you again about whether you want to overwrite the current configuration file, and will also allow you to inspect the differences between the currently installed file and the one that update-grub wants to install. See if there are any significant changes (pay attention to information such as which partitions to use for booting), and if looks OK – allow the file to be replaced.

update-grub will then update your boot sequence with the new configuration file, and after rebooting (press ESC if you need to see the grub menu to make any changes), your new kernel should be running smoothly and rsyslogd should hopefully behave properly again.

5 thoughts on “rsyslogd stuck at eating 100% (or more) CPU after upgrading to Ubuntu Natty Narwhal”

  1. Yeah, this seems an ugly regression testing bug. It was reported and fixed before (see #523610, http://j.mp/lg1rg0).

    Problem is: in some cases (specially when you’re not using grub, like when using Slicehost/RackSpace Cloud servers), the do_release_upgrade will upgrade everything your system to natty — everything EXCEPT your kernel (2.6.32.12-rscloud in my case) that doesn’t support rsyslogd…

    This creates a tricky situation where your rsyslogd gets stuck on the loop you described, but you can’t easily upgrade your kernel unless you go through the non-trivial (and unsupported) process with pv-grub. Not fun :(

    I’m still researching the best solution, but I guess that downgrading rsyslogd will be the best option (or maybe taking a few hours to go through pv-grub + manually compiled kernel)..

  2. Quick update (for others arriving here through Google): I just solved removing the rsyslogd (4.6) and going back to 4.2.0-2ubuntu8. Now rsyslogd is back to 0% CPU and no more millions of error messages on log files..

  3. I resolved this issue after upgrading to Natty with Gui Ambros suggestion. Afterwards I set the rsyslog package to “hold” so it wouldnt update until they get this fixed.

    (as root) echo “rsyslog hold”|dpkg –set-selections

  4. Hey, thanks a lot Math!

    I’ve encountered the same problem on VDS where kernel is provided by the hoster while whole the other system is under my control.

    Looking forward to solve it by either upgrading the kernel or downgrading my rsyslog

Leave a Reply

Your email address will not be published. Required fields are marked *