ISOMEDIA will performing scheduled network maintenance on 08/31/2017 at 12:00AM PDT to 04:00AM PDT. During this window, there will multiple periods of increased latency and packet loss, as network protocols re-converge. These periods may last between 5 and 15 minutes.
All times are estimates based on expected outcomes of the work being performed and previous experience performing the same or similar work. There is always the possibility of some unforeseen bug, or problem, that could extend the maintenance time or cause a disruption in connectivity. Administrators will make every effort to correct the problem, or implement the back out plan quickly, if something does occur.
Spammers have figured out how to bypass spamassassin spam scoring rendering the bulk of our spam filtering capabilities non-functional. I have not been able to determine how they are doing this yet. It is happening with both incoming servers so it is not a per server problem. I’ve also found other people are experiencing this as well but have not found any solutions elsewhere either.
We are experiencing intermittent and at times extreme packet loss where our equipment is co-located. We have reported the issue to our data provider there.
Upon closer investigation, what appeared to be a denial of service attack triggering rate limiting on our name servers and crashes was in fact self inflicted.
At some point I accidentally copied over the virtual domain configuration file for the slave name servers onto the master name server so it was effectively no longer a master. It is a hidden server (so the master can not be attacked) but since data from the master server was no longer available, everything went fine until the zone’s on the slaves expired.
At that point all the slaves contacted the master trying to refresh for each domain and since the master had no data to serve at that point, they could not, so they kept trying until they triggered rate limiting on the master. Then the slaves did not know how to handle that and just died.
Once I discovered this I was able to restore the virtual domain configuration file from backups and then the slaves updated their zone files successfully and all was good once again.
Someone launched a denial of service attacks which repeatedly caused three of our public name servers to crash earlier but the downtime was less than a minute on each as I have scripts in place that check for the proper operation of our name servers once each minute and relaunch them if they are inoperative, and at no time were all name servers simultaneously out of service.
The ticket system is now finally fixed and working properly under PHP 7.0.
Maintenance is completed. I need to check all the shell servers to make sure they have properly remounted NFS file systems but I will do that when I get home.
Sorry this took so long, a combination of bugs in Ubuntu’s start-up script and operator error, at one point I powered down the wrong server.
I am going to be rebooting our servers tonight, probably around 10:30pm or so in order to load Microcode that fixes a bug in the Intel I7-6700K and I7-6850K processors used in our servers.