downtime
So, redefine was down yesterday, and today. Yesterday, it was down because I made a DNS-related boo-boo. Today, however, the downtime was an entirely different story.
Last week, I attended the Bay Area FreeBSD Users Group meeting, and "drank the kool-aid" (so to speak), singing the praises of the next release of FreeBSD on my blog. However, the very next day, redefine crashed.
So much for the stability of FreeBSD 6, I reckon.
The error that I saw on the console was quite strange. It was many recurring messages, that looked like this:
The number following blkno would change, among a small set (3-4) of values. The really strange thing is that the machine was still able to NAT and route IP packets for me. I didn't notice that anything was wrong until I tried to hit a web page or SSH in. In addition, I was able to use the keyboard to change virtual terminals, but I couldn't type any characters whatsoever. The whole thing was very strange. At the time, I was getting ready to go to Seattle, so I didn't have time to do any troubleshooting. So, I just rebooted, and took off.swap_pager indefinite wait buffer: bufobj: 0, blkno ####, size 4096
But then it happened again today. I didn't notice that it was down until I got to work, so I had to leave it down the whole day, until after I got home. Now that I have seen the same error twice, it is time for action. My first observation is that this behavior only occurs when the machine is under some form of load. I know this because this machine had been up for over a month before I swapped it over to being the main redefine. And during that time, I did stress it - I did at least one "make world", several kernel builds, and I compiled a ton of packages.
Yet, I think that redefine is under more stress now that it is on the network. In particular, the blogs here are getting hit pretty hard with comment spam, which can be quite stressful for the machine. In addition, I am passing a lot more data through the network interfaces, which means that the network stack is getting stressed more.
But, before I report this issue to the FreeBSD gods, I thought it would make sense to upgrade to the very latest code in the STABLE branch. That way, if the issue happens again, I'll know that it is a bug that hasn't been addressed since a month ago, when I last refreshed my system.
So now, we play the waiting game.
-Andy.
Technorati Tags: FreeBSD, UNIX, Open Source