Mammoth DSL Maintenance Still In Progress

     Mammoth DSL maintenance that was supposed to have completed by 8AM mountain time (7AM Pacific) is still in progress.  It is taking longer for them to swing the circuits than anticipated.

     This effects most Seattle DSL customers on CenturyLink DSL circuits.  They are working to restore out of service circuits as fast as possible.

Reverted to 2.26 kernels

Had to revert to old kernels because there is a incompatibility between the nfs-utils, specifically idmapd, provided with CentOS 6 and the later 3.x kernels.

Unfortunately, I was unable to build a modern version of nfs-utils because of missing libraries I was unable to chase down this evening.

So I will continue to try to chase down the missing library and in the meantime will probably build a 2.26.39 (the last 2.26 kernel) with a lot of the garbage pulled out and pre-emptive scheduling. Will at least get some, but not all, of the performance gains that way.

It really takes the wind out of my sails to spend so much time on something, thought I had it working, and then have to backtrack.

Maintenance This evening 8pm-midnight

     I will be rebooting the host machines which will cause all Linux guests to freeze for about 15-20 minutes that are guests of any given host.

     The 3.19.1 kernel so hugely improved performance over the 2.26.x kernels on the guest machines that it makes sense to get them on the hosts as well.  Because of my wife’s work schedule, tonight is one of the few nights where I have the car and can go do this so tonight it is.

     The 3.19.1 kernel cut the memory usage for a given workload almost in half, made things go faster, and reduced the overall load on the machines I’ve put it on.  The memory usage reduction is most significant as it allows more RAM then to be used as cache so less disk access is required.  Whenever anything is slow, it almost always comes down to waiting on disk I/O.

Linux 3.1.19

     Linux 3.1.19 is the latest stable kernel but every distribution out there is at least three minor point releases behind, and in the case of CentOS 6, they’re still on a 2.6 kernel.

     I managed to compile a monolithic (all modules needed compiled in) 3.1.19 kernel under CentOS 6 and successfully got it to boot and run.  There are a few minor glitches such as nfs-utils needs to be updated, but the new kernel falls back to old behaviour so it still works even though it emits a few bitch messages.

     The newer kernels have some capabilities that allow Apache, Bind, Postfix, and some other programs to operate more efficiently.  Building a kernel from scratch also allows me to remove a lot of unneeded cruft that just wastes memory and CPU cycles.  It allows me to adjust parameters to optimize for our hardware environment and work loads.

     I’ve attempted this in the past but this is the first time I’ve been successful with 3.x kernels.

Spring Forward

     Remember everyone, tonight is the night we Spring Forward, technically tomorrow morning at 2AM.  Set your clocks before you go to bed tonight and get ready for not enough sleep but a bit more evening daylight.

Too Quiet

Too quiet today, not a single phone call.  I’m feeling like the Maytag repairman.  Do I need to break something?

Firebug

     For those of you who are web developers, there is one tool that goes with Firefox that you should have, it’s called “Firebug“, and what it does, among other things, is measure the time every component of your web page takes to download.

     Many of you will find web hosting offers for $3-4 per month that might sound good on the surface.  I recommend you use this tool to load a page hosted on these sites (and not the sites home page because often they will put that on a different high end server), and compare the response time to sites hosted here.

     There are two important metrics to note, the time it takes the first component to start loading, this is known as time to first data, and it is important because Google assigns a higher rank to those sites that gets to first data under 200ms.  The other is the time it takes the whole page to load because the user experience is impacted there.

     If you run this on our home page, it will be the same as any equivalent user page (it is possible to write bad php or javascript code and slow a page even on a fast server) because we use the exact same servers based upon Apache 2.4.x (whatever the most current release is) everywhere, and I’ve spent quite a lot of time optimizing and fine tuning it.

     I’ve also spent time optimizing the supporting infrastructure, disk I/O, caching, etc. I want my customers to be successful, and to that end I do my best to give you every advantage.  Most of these low end providers only want to minimize their cost per customer in order to maximize their profits.

Mammoth – The Remaining Three Customers

This will affect the remaining three customers not affected by the first outage.  Again, it will result in being moved to a faster backbone circuit so end result is worthwhile.

Date: 3/11/2015
Start time: 5:00 am MST
End time: 8:00 am MST
Affected: 74/OBGJ/090579//ACSO

Detail:

Maintenance is being performed in order to groom the OC3 circuits to the new OC12
circuit. All ATM customers on: 74/OBGJ/090579//ACSO will be affected. Customers will
go down and come up as their circuits are groomed to the new service, some customers
may need to reboot to restore services. Estimated downtime is roughly 2 hours but
maybe as short as 5 minutes for some customers.

Mammoth Networks Maintenance To Effect Our DSL Customers in CenturyLink territory:

I received this note from Mammoth, the work they are doing will cause an interruption for all but two of our customers.  The good news is that you’ll be on a faster circuit when the work is completed.  Mammoth has always been good about provisioning as necessary for the traffic they carry.

Date: 3/10/2015
Start time: 5:00 am MST
End time: 8:00 am MST
Affected: 74/OBGJ/088866//ACSO

Detail:

Maintenance is being performed in order to groom the OC3 circuits to the new OC12
circuit. All ATM customers on: 74/OBGJ/088866//ACSO will be affected. Customers will
go down and come up as their circuits are groomed to the new service, some customers
may need to reboot to restore services. Estimated downtime is roughly 2 hours but
maybe as short as 5 minutes for some customers.