Outlook E-mail Outage

     Some of our customers have had e-mail to sites hosted by Outlook fail and get bounced with something on the order of cannot lookup sender address or some such.

     Microsoft claims troubles leaving there fine, i.e., they have no issue but Down Detector says otherwise and I’ve had reports from other sites experiencing the same, so just a heads up, this issue is affecting multiple sites to Outlook addresses or addresses hosted by outlook and the issue is not on our end.

Denial of Service

     Our web server is presently undergoing what is known as a “Slow and Low” denial of service attack.  In this type of attack someone initiates a large number of connections from sources which are very slow.  This limits our servers ability to finish a connection and so it eats up all available connections.  To counter this we’ve increased greatly the number of connections available but it still eats up a lot of memory forcing cached data out so the system must go to disk for most requests which slows things down.  Unfortunately our router decided to pick this time for a firmware upgrade and so traffic analyses is not available until the upgrade completes so we can’t readily identify and lock-out the source.

Flash Content

     If you wish to play the flash games in the Games section of our website, Ybbored.com, Defender-games.net, or any other flash site, now that Adobe has discontinued flash, you can do so with Firefox and an add-on plugin called “Flash Player 2022”.  This works pretty much with any OS that Firefox works with, including Linux.

Downtime Friday 11PM – 1AM

      It took me nearly two hours to get everything back up after booting into the new kernels tonight.

     The issue revolves around some recently added ufw rules to improve security.  Even though I have explicit rules permitting machines that need to see each others portmapper (rpcbind) to do so, they aren’t working and when ufw is enabled, none of the machines can see each others portmap, this breaks nfs and nis.

     So I’ll have to do some further investigation as to why this is the case, but that was the cause of the long downtime.

     This affected all eskimo.com services including our Fediverse services.

Web Based Terminal and Console Access

     The existing web based terminal and console access is broken because the Guacamole installed requires some features not present in OpenSSL 3.1, so I will have to compile and install a newer version.  I hope to get this corrected later this evening.

 

Tonight’s Outage

     Tonight’s outage was not the result of a hardware or software error, rather the result of an operator error.  I had built a new kernel and had intended to try it on my workstation before deployment but I also had a window open on the main file server because that is where I store and distribute kernels from and also where I have the configuration files.  I went to reboot my workstation but was in the wrong terminal and rebooted the server instead.  And because I hadn’t shut the virtual machines on it down properly, it did not come up cleanly, in particular the kernel NFS server was snarled and restarting it did not correct, so a second reboot was necessary.

     We will be performing a kernel upgrade to 6.1.9 this Friday, not because there are any obvious issues for 6.1.7, operator errors aside, it has been very stable, but because I made an error and misconfigured it.  I’ve corrected this on the web server which is most sensitive to this but really need to fix it on all machines.  And since 6.1.9 does have some minor fixes might as well get that in place.

    I am most looking forward to the release of 6.2, because it has some fixes that largely recover the performance lost to the various security work-arounds for the Intel Skylake chips and two of our physical servers are based upon this architecture.