MySQL Database

     MySQL continues to run for a while and then exhausts file descriptors even though it’s set to more than a million.  We are running a version that previously was stable.  I’ve fixed the stuff in the start up script that it removed.

     I am still trying to identify the cause and correct it.

 

Continued Database Problems

     We continued to experience database problems.  I found that the upgrade had erased a change I made to the systemd script to increase the system limit on file descriptors.  MySQL was running out of descriptors.  I will probably update again and re-apply my fixes to the script that it will no doubt erase again but I am going to let it run with this fix for a while to make sure it is table first.

Mail Server / Web Server

     At 12:49 AM the client mail server which provides SMTP, IMAP, and POP3 for clients and web mail, crashed.

     This is a virtual machine sitting on a physical host.  The physical host was still running clean.  This is the first time this particular machine has crashed since it was loaded in 2013, the software is stable.  However, CentOS 6 did release a kernel upgrade in spite of the age of the software with absolutely no notes as to what they changed / fixed.  I have applied this upgrade.

     MySQL has also been somewhat unstable on the web server since an upgrade several days ago.  I have downgraded MySQL to 5.7.15 which is the version prior to the recent upgrade which introduced instability.

Maintenance Outage 1/18/17 1:30-2:30

     There will be a hopefully brief maintenance outage during this time frame as the 100mbit/switch that was temporarily put in service when the gigabit switch failed Monday morning is being replaced with a new gigabit switch.

     If all goes well this should only take a few seconds, however, the last time we did a couple of the machines failed to recognize the speed change without a reboot.  Hopefully this will not be necessary this time.

 

Bad Switch

     The outage this morning from 1:20AM until 5:10AM was caused by a failed switch. The gigabit switch has temporarily been replaced with a 100mbit switch that I had on hand.  Some services relating to mail took somewhat longer to restore owing to the machine not recognizing the speed switch without a reboot and then the reboot loaded a new kernel that apparently lacked NFS support.

SquirrelMail

     An update replaced PHP 7.0 with PHP 7.1 which Squirrelmail does not work with yet.  I have reverted the PHP version to 7.0 to restore Squirrelmail to service.

Today’s Outage

     I had to make some changes to our router configuration this afternoon.

     The way the software in our router works, you go into a configuration mode, make all the changes you want to make, then commit or save those changes at which point they become active.

     I did this, hit the SAVE button.  It said, “Save Failed” and then crashed.

     I had to drive down to the co-location facility and reconfigure the router to bring it back online.

     In order to do this I had to change the IP addresses on one of my machines to 192.168.1.2 in order to communicate with the factory default address of the router and then to reconfigure it.

     That all went well and I had the router back online by 3pm, but when I went to change the IP address back on the machine I used to configure the router, the

Crash and Burn

Crash ‘N Burn, No Return

graphical tools failed and screwed up it’s configuration past the point where it could be fixed by the graphical client.

     I was not familiar with where Ubuntu keeps all of it’s net stuff, but found the files and got that machine restored to health as well.  Most things were online by 3PM mail service, web service, and some of the shell servers.  This particular box hosts a number of shell servers and so some were down until about 4:30pm when I got it completely restored.