Maintenance Tonight – Midnight until ???

     I am planning on doing some maintenance at the co-lo facility tonight that will involve rebooting three of the host servers.  These are the machines that have your /home directories and /var/spool/mail mail spools, as well as various virtual machines.

     The last time I did this we were down for several hours.  I have come to a good understanding of what caused the issues last time so they can be avoided this time.  I expect a downtime of about 1/2 hour for the server that hosts /home directories and about 10-15 minutes for the others.

     The reason for these reboots is to load a 4.8 Linux kernel.  There was substantial work done to NFS in version 4.8 that improves performance by correcting a few sections of slow critical code and by using more aggressive caching.  Since our whole service is heavily dependent upon NFS to mount file systems remotely from one machine to another, this should improve the overall performance of our network, Mail, Web, shell serves, all should run faster after this reboot.

OpenSuse Down

    The upgrade of opensuse.eskimo.com resulted in a non-bootable machine.  So since I really didn’t have much in the way of apps other than what comes with the distribution on this box, I am going to install Leap from scratch.  OpenSuse may be down for a few days.  It was getting very little use anyway so doubt it will be badly missed in the interim.  I am anxious to see what has changed in Leap.  One thing that has changed for sure is that it’s systemd based like most other modern Linux’s and problems with the systemd scripts appear to be what is preventing it from booting properly after the upgrade.  Hopefully a clean install will work better.

OpenSuse Upgrade in Progress

     The shell server opensuse.eskimo.com is being upgraded from OpenSuse 13.2 to OpenSuse Leap 42.2.  Although it is usable during the upgrade, programs may be randomly killed so it is not advised that you use it for anything serious until the upgrade completes which will probably be another two hours or so.

WordPress Images

     For some time WordPress image upload was failing here. I do not know when this started today yesterday was the first time I received a complaint about this problem.

     The graphicsmagick image manipulation library which is a fork of the imagemagick image manipulation library is just completely broken under Ubuntu 16.04.1 LTS and core dumps.  This broke PHP code that was manipulating images such as generating the resized image and thumbnails after you uploaded an image.

     I removed the graphicsmagick libraries and replaced them with the older and functional imagemagick libraries and now I am able to upload images properly.

Authentication Fixed

     This was an operator error.  In an attempt to tighten security and stop the potential for someone to obtain encrypted passwords via the NIS database, I firewalled the NIS servers off from everything.  Unfortunately, this included the local NIS clients so no server was able to talk to the NIS servers to obtain authentication information.  This has been fixed.

 

Server Failed

     After updates, our server hosting the /home directories failed to boot.  It was a combination of updates gone bad and operator error.  The operator error involved a typo in the /etc/fstab file and then when it would not remount the root partition / read-write, even though it was clean on an fsck and I could mount it manually, I assumed it was an issue with systemd (and there have been many).  So in desperation I switched to upstart, but it deleted a large number of necessary software packages in the process and I ended up with a machine I could not even boot into single user mode.

     So I brought the machine back home, made myself a server rescue disk, booted from that, got things cleaned up enough to boot, spent another eight hours or so installing missing software and finally got it working again.

     Total downtime was from about 12:30AM – 4:15PM on November 20th.