Computers: April 2006 Archives
So, redefine was down yesterday, and today. Yesterday, it was down because I made a DNS-related boo-boo. Today, however, the downtime was an entirely different story.
Last week, I attended the Bay Area FreeBSD Users Group meeting, and "drank the kool-aid" (so to speak), singing the praises of the next release of FreeBSD on my blog. However, the very next day, redefine crashed.
So much for the stability of FreeBSD 6, I reckon.
The error that I saw on the console was quite strange. It was many recurring messages, that looked like this:
The number following blkno would change, among a small set (3-4) of values. The really strange thing is that the machine was still able to NAT and route IP packets for me. I didn't notice that anything was wrong until I tried to hit a web page or SSH in. In addition, I was able to use the keyboard to change virtual terminals, but I couldn't type any characters whatsoever. The whole thing was very strange. At the time, I was getting ready to go to Seattle, so I didn't have time to do any troubleshooting. So, I just rebooted, and took off.swap_pager indefinite wait buffer: bufobj: 0, blkno ####, size 4096
But then it happened again today. I didn't notice that it was down until I got to work, so I had to leave it down the whole day, until after I got home. Now that I have seen the same error twice, it is time for action. My first observation is that this behavior only occurs when the machine is under some form of load. I know this because this machine had been up for over a month before I swapped it over to being the main redefine. And during that time, I did stress it - I did at least one "make world", several kernel builds, and I compiled a ton of packages.
Yet, I think that redefine is under more stress now that it is on the network. In particular, the blogs here are getting hit pretty hard with comment spam, which can be quite stressful for the machine. In addition, I am passing a lot more data through the network interfaces, which means that the network stack is getting stressed more.
But, before I report this issue to the FreeBSD gods, I thought it would make sense to upgrade to the very latest code in the STABLE branch. That way, if the issue happens again, I'll know that it is a bug that hasn't been addressed since a month ago, when I last refreshed my system.
So now, we play the waiting game.
-Andy.
Technorati Tags: FreeBSD, UNIX, Open Source
FreeBSD 5 vs. 6
- transition, not that 5 is bad, it was just transition
- some rough edges, in usability and performance of FS
- focus over last year has been on polishing and fixing bugs
- 6 is usable for desktops and servers and appliances
- 6 has proper threading support, with libthr. Works quite well in 6.1. Equivalent to pthreads at the moment. Two different libraries adhere to same API, so you can choose which one want to use, via libmap. Allows mapping different libraries to applications, on a per application basis. Similar to LD_PRELOAD.
- empirically, stability of 6 is on par if not better than 4.10 -> some VM bugs in 4.8, 9, 10, got fixed in FreeBSD 4.11
- And of course, all of this stuff is fixed in FreeBSD 6.
- Definitely increased stability from 5 to 6.
- Definitely not worse than FreeBSD 4, some peripheral things that 4 might still be doing better (random USB device, soundcard, old ATA device)
- Been doing a lot more stress testing
- Yahoo! and Ironport are moving to FreeBSD 6
- hub.freebsd.org now running 6.1-PRERELEASE (12 cpu machine)
- 6 is working much better with multiple processors -- the threading is doing much better.
- Have been back-porting features from -CURRENT into 6.1
- For example, tons of amd64 work from Yahoo! - can go 64bits, but keep custom 32bit applications running.
- Drivers added
- Large update to ATA disk driver, supports more software RAID, and some of the pseudo-hardware RAID
- SAS - Serial Attached SCSI support, same connector as SATA, but SCSI protocol - starting to get support in FreeBSD 6.1
- nullfs works much better now, can be used in jails
- Nate Lawson talking about ACPI next month
- Nawaf Bitar, June bafug talk, was head of system development at SGI, giving talk on multithreading and multiprocessor schedulers.
The response:
- What I saw could be a reporting error, with thread changes, been some reporting bugs concerning display of who is using what cpu
- things display as running on cpu 0, but might have been running on other processor, but when they release, they drive back to CPU 0
- should loadavg be 2 if you max out a dual processor box? (I can confirm that this was the case on FreeBSD 4.x, but I'll need to look to see if it is the case on 6.x).
- On FreeBSD, there is nothing to affect processor affinity. On "nice to do list".
- Julian was able to dredge up some knowledge of some some sysctls - kern.sched - some tunables, one called "followon" in particular. If you are a thread in single process, about to give up the CPU, find another thread from same process and continue (try to keep multiple threads on same cpu). This is turned off by default.
- The upcoming ULE scheduler might have some affinity stuff
- There is a kernel option, KTR, which enables fine-grained kernel tracing. It is different from ktrace or truss, in that KTR allows for an instrumentation view inside of the kernel.
- Lot of events have KTR flag on them, on compile time, compile in a mask of bits, then use a sysctl to activate them. logs entries into a KTR buffer in memory, can drop into kernel debugger and "show ktr" to show the list, or run a userland program to dump buffer out to file. ktrdump? One of the bits is for the scheduler, thousands of entries per second.
- Google for "KTR, scheduling, graph" -> utility to draw scheduling graph and show which thread on which CPU.
- ktrace -p <pid> will attach to an existing pid, and print out the syscall information.
- ktrace -dp <pid> will attach to an existing pid, and all of its running children, and print out the aggregate syscall information.
- ktrace -dip <pid> does the same as above, but also captures information for children that spawn while ktrace is running.
- csup works for checking out a particular tag
- csup doesn't do mirroring of source tree
- csup has just been checked into current
- we think csup does as good as a job bandwidth-wise as CVSup
- Doing FreeBSD 5.5 release to tide people over that can't move to 6 yet.
- HT is a performance gain if you are doing floating point work
- Xen virtualization will appear in 6.2, both domain 0 and and running in Xen VM. Each Xen VM will be able to claim a CPU, to better utilize multi-core boxes. Design of Xen require architectural things in the kernel that FreeBSD doesn't have yet.
- There was a question about the state of alternative schedulers in FreeBSD. The answer, via Scott, is that ULE is still fairly neglected, some work done a few months ago, still experimental. Not sure of the ETA.
- For wireless, there is something called "WPA supplement"(?), also does WEP, can also be used to configure wireless NIC before running dhclient
- rcorder - determine order that scripts will launch, based upon rc.conf and dependencies in the new RC system.
-Andy.
Technorati Tags: bafug, FreeBSD, UNIX, Open Source
A quick Google pointed out this helpful blog post, which said to set an "API Password", and all would be well.
Well, I did that, and things weren't well. Things haven't been well for days. So this evening, I finally sat down with "print STDERR" and figured out the problem. As it turns out, there is a bug in Movable Type (in my opinion). In the file 'cgi-bin/mt/lib/MT/XMLRPCServer.pm', there is this function (right at the top of the file):
The bug is in the 3rd line of the function. In Movable Type 3.2, the name of the configuration file was changed from 'mt.cfg' to 'mt-config.cgi'. Unfortunately, this code still references the old filename. Normally, this isn't a problem, because you will have one file or the other. If the 'mt.cfg' file doesn't exist, some other part of MT will "do the right thing" and find your configuration file anyway.sub mt_new { my $cfg = $ENV{MOD_PERL} ? Apache->request->dir_config('MTConfig') : $MT::XMLRPCServer::MT_DIR . '/mt.cfg'; my $mt = MT->new( Config => $cfg ) or die MT::XMLRPCServer::_fault(MT->errstr); $mt; }
However, I did an upgrade from a really old version of Movable Type. Thus, I had both the old 'mt.cfg' and 'mt-config.cgi' files in my 'mt' directory, and of course the old file had an invalid configuration, which was causing the above code to fail spectacularly. The fix is to either change the third line of code to look like this:
Or, to get ride of the old 'mt.cfg' file (which is the route that I took). What really stinks about this whole episode, is that the logging/debugging facilities in MT appear to be really poor. This code was failing in an odd way, and it didn't leave any trail for me to follow in order to figure out what was going on.$MT::XMLRPCServer::MT_DIR . '/mt-config.cgi';
-Andy.
Technorati Tags: MarsEdit, Movable Type
I have been "working" on some major upgrades to my home computing infrastructure for several months now (and by "working", I mean mostly playing Mario Kart with Kevin). My main server, redefine was getting a little stale - the OS was stuck on FreeBSD 4.x, I was clinging to Apache 1.3.x, and Movable Type was stuck on version 2.6, and was really spam-prone as a result. Plus, the hardware was getting pretty noisy, and I wasn't doing so good about having RAID and backups of my disks.
So, I have finally stiched together enough time to get all of these things fixed. Back in January, I bought two 400Gb SATA drives and a RAID controller. I slapped this new hardware into one of my older machines (dual Pentium Pros upgraded to Dual Pentium II's, baby!), and installed FreeBSD 6. I then installed the latest Apache, Movable Type, etc., and got everything configured. And above all, I am running on mirrored drives now. Disks and RAID controllers are so cheap nowadays, that it really didn't make much sense to have a single point of failure with my most important data.
The biggest challenge was in getting all of my data moved over from the old setup to the new setup. Good old rsync handled bringing all of my static data in the home directories over. However, the Movable Type upgraded wanted me to switch from BerkeleyDB to MySQL, which required getting MySQL configured (not necessarily my forte, but I got it to go), and then converting all of the blog data. I had some issues, but luckily the Movable Type forums and documentation are pretty solid.
So, there may be broken-ness abound for the next few days. Right now, for example, my blog will take comments, but they appear to be going into some moderation system that I don't fully understand. In fact, comments appear to be a lot different in Movable Type 3.2. It looks like I have a lot more reading to do...
-Andy.