Our web server is slow currently because it is being hit with a denial of service attack from Amazon cloud server nodes. So far fail2ban has locked out 454 addresses, the majority being amazon nodes. As it locks out attacking addresses, the server load is slowly coming down.
The outage this afternoon was caused by a failed attempt to install ossn, an open source social network program onto our web site. Things seemed to work until I turned cache on then it went to a blank screen. I gave up and went to bed (about 6am).
This afternoon I discovered our website wasn’t responding, complaining of mysql descriptors, apparently ossn got stuck in some sort of loop and ate them all up. I’ve removed the program from the server until I can determine what is wrong.
One of our long term customers, Greg Wickenburg is in need of your help. You can read his story here: Fundraiser by Greg Wickenburg. Greg has been with us for close to two decades, I’m hoping we can help him heal and get on with a better life.
I upgraded our Apache web server to 2.4.29 today.
Changes with Apache 2.4.29 *) mod_unique_id: Use output of the PRNG rather than IP address and pid, avoiding sleep() call and possible DNS issues at startup, plus improving randomness for IPv6-only hosts. [Jan Kaluza] *) mod_rewrite, core: Avoid the 'Vary: Host' response header when HTTP_HOST is used in a condition that evaluates to true. PR 58231 [Luca Toscano] *) mod_http2: v0.10.12, removed optimization for mutex handling in bucket beams that could lead to assertion failure in edge cases. [Stefan Eissing] *) mod_proxy: Fix regression for non decimal loadfactor parameter introduced in 2.4.28. [Jim Jagielski] *) mod_authz_dbd: fix a segmentation fault if AuthzDBDQuery is not set. PR 61546. [Lubos Uhliarik <luhliari redhat.com>] *) mod_rewrite: Add support for starting External Rewriting Programs as non-root user on UNIX systems by specifying username and group name as third argument of RewriteMap directive. [Jan Kaluza] *) core: Rewrite the Content-Length filter to avoid excessive memory consumption. Chunked responses will be generated in more cases than in previous releases. PR 61222. [Joe Orton, Ruediger Pluem] *) mod_ssl: Fix SessionTicket callback return value, which does seem to matter with OpenSSL 1.1. [Yann Ylavic]
I apologize for the downtime. This resolved to operator error this time.
When I got to the co-location facility, I discovered that when I moved all the virtual machines off of the failed hardware, I neglected to set the boot option to start the virtual machines on boot up so they were waiting on a manual start.
I also got a new BIOS that was supposed to fix the HME vulnerability from Asus but when I attempted to install it the motherboard said, “Not a proper BIOS”. So back to the drawing board on that one.
I’m not sure if it hung going down or coming up but the mail server did not reboot properly. It pings but I can not connect to any services. So I am heading down to the co-location facility to resolve. Will take about 45 minutes.
I will be interrupting services briefly tonight to reboot the physical hosts and a number of the guest machines for kernel upgrades. These interruptions should be relatively short in duration, around ten minutes.
osTicket had been upgraded from 1.10 to 1.10.1, not the versions I thought we are on, but it’s the most recent stable version.
They’ve apparently moved things away from open source and github now has old versions rather than the current versions. Not happy with this move as it gives community less opportunity to contribute to it’s development.
Please refrain from creating trouble tickets until further notice this evening. In the meantime please either e-mail email@example.com or call 206-812-0051.
I will upgrading OSticket this evening. OSticket has completely changed database schemas and so I will not be able to carry over old tickets. However right now there are no open tickets which makes it an ideal time to do this update.
All services have been moved off of failing hardware. There should be no further unscheduled interruptions.
There will be some interruptions of services lasting about 20-30 minutes per service between 10pm-6am over the next few days in order to make backups of the newly configured machines so that if there is a failure restoration returns to the current configuration.