Mail Is Up ; Future Maintenance

     Mail is backup.  I have a bunch of updates to apply then will need to take down for about twenty minutes to image the machine.

      Also, SSL certificates are currently expired, will fix shortly.  In the meantime just tell your mail client to accept the security exception.

Mail

     The physical host is back up.  This means you can access and receive mail from shell servers.

     I am working on restoring the virtual machine mail from an earlier backup before instabilities crept in.

Mail Server

     The mail server crashed again this afternoon.  In an attempt to fix it rather than just reboot again I decided to restore from a backup made prior to the time it started acting up.

     During the restoration the physical host hung so I will need to drive down to the co-location facility to restore the physical server then finish bringing mail backup.

 

NX and Remmina Support

     Shellx had support for NX and Remmina.  This was lost when I replaced the server with CentOS 6 owing to incompatibilities with the newer LibNX libraries used by x2go.

     I had thought there was little hope of this being fixed as freenx is no longer being actively developed as x2go has replaced it’s functionality, is available on a broader range of platforms, and had much additional functionality such as sound.

     But… I ran across a version to point releases later of freenx on one of the third party repositories and I installed and tested it and it worked, so this functionality has returned to centos6 which replaces shellx.

Mail Client Server Stability

     We have been  plagued with a recent rash of hardware and software problems as of late.

     A current problem I am seeing is the older Redhat 6 servers are mysteriously powering themselves down.

     It may be related to kernel upgrades I attempted recently.  The newer kernels required a newer version of acpid.  The acpid daemon is a program that listens to things like power and reset buttons and then acts accordingly.

     I had to back out the kernels because the mandatory locks which mail really requires to operate properly were broken under 4.8 at least.  I neglected to back out the updates to acpid at the same time.

     In addition, on mail, sogo, a package we have in place to provide some interoperability with ms-exchange clients, overwrote some of the system python modules and broke some of the admin tools.

     I’ve fixed these things and I’ve turned on some additional logging so that if it happens again it will hopefully provide additional information for troubleshooting.

Web Server

     I deeply apologize for the lengthy downtime of the website and MySQL database tonight.

     Ubuntu kicked out an upgrade to mysql server that seriously broke it.  With the update installed, as soon as someone tried to connect from anywhere but localhost, the server would get stuck in an infinite loop complaining of bad file descriptors and not a socket.

     I tried to revert to the previous and only other version on the Ubuntu repository but it was also broke,

     In addition, they hard coded in a limit of 65536 file descriptors which is not adequate for our sites traffic.

     I tried to migrate to MariaDB but there was no clean path from MySQL 5.7 to MariaDB and nothing I tried would work.

     I then attempted to install Community MySQL directly from the site but owing to the fact that I had a mysql user in the NIS database, it broke the install.  However it did not log the cause or print an error that would give me a clue as to what the problem was so it took a lot of tracing and plodding around to find the cause and fix it.

     We are now up and running with Community MySQL Server version 5.7 and it is back to functioning properly.

MySQL Database

     MySQL continues to run for a while and then exhausts file descriptors even though it’s set to more than a million.  We are running a version that previously was stable.  I’ve fixed the stuff in the start up script that it removed.

     I am still trying to identify the cause and correct it.

 

Continued Database Problems

     We continued to experience database problems.  I found that the upgrade had erased a change I made to the systemd script to increase the system limit on file descriptors.  MySQL was running out of descriptors.  I will probably update again and re-apply my fixes to the script that it will no doubt erase again but I am going to let it run with this fix for a while to make sure it is table first.

Mail Server / Web Server

     At 12:49 AM the client mail server which provides SMTP, IMAP, and POP3 for clients and web mail, crashed.

     This is a virtual machine sitting on a physical host.  The physical host was still running clean.  This is the first time this particular machine has crashed since it was loaded in 2013, the software is stable.  However, CentOS 6 did release a kernel upgrade in spite of the age of the software with absolutely no notes as to what they changed / fixed.  I have applied this upgrade.

     MySQL has also been somewhat unstable on the web server since an upgrade several days ago.  I have downgraded MySQL to 5.7.15 which is the version prior to the recent upgrade which introduced instability.