Eskimo’s Web

     The outage this afternoon was caused by a failed attempt to install ossn, an open source social network program onto our web site.  Things seemed to work until I turned cache on then it went to a blank screen.  I gave up and went to bed (about 6am).

     This afternoon I discovered our website wasn’t responding, complaining of mysql descriptors, apparently ossn got stuck in some sort of loop and ate them all up.  I’ve removed the program from the server until I can determine what is wrong.

Apache upgraded to 2.4.29

     I upgraded our Apache web server to 2.4.29 today.

Changes with Apache 2.4.29

  *) mod_unique_id: Use output of the PRNG rather than IP address and
     pid, avoiding sleep() call and possible DNS issues at startup,
     plus improving randomness for IPv6-only hosts.  [Jan Kaluza]

  *) mod_rewrite, core: Avoid the 'Vary: Host' response header when HTTP_HOST
     is used in a condition that evaluates to true. PR 58231 [Luca Toscano]

  *) mod_http2: v0.10.12, removed optimization for mutex handling in bucket
     beams that could lead to assertion failure in edge cases.
     [Stefan Eissing] 

  *) mod_proxy: Fix regression for non decimal loadfactor parameter introduced
     in 2.4.28.  [Jim Jagielski]

  *) mod_authz_dbd: fix a segmentation fault if AuthzDBDQuery is not set.
     PR 61546.  [Lubos Uhliarik <luhliari redhat.com>]

  *) mod_rewrite: Add support for starting External Rewriting Programs
     as non-root user on UNIX systems by specifying username and group
     name as third argument of RewriteMap directive.  [Jan Kaluza]

  *) core: Rewrite the Content-Length filter to avoid excessive memory
     consumption. Chunked responses will be generated in more cases
     than in previous releases.  PR 61222.  [Joe Orton, Ruediger Pluem]

  *) mod_ssl: Fix SessionTicket callback return value, which does seem to
     matter with OpenSSL 1.1. [Yann Ylavic]

Everything Is Back Up

     I apologize for the downtime.  This resolved to operator error this time.

     When I got to the co-location facility, I discovered that when I moved all the virtual machines off of the failed hardware, I neglected to set the boot option to start the virtual machines on boot up so they were waiting on a manual start.

      I also got a new BIOS that was supposed to fix the HME vulnerability from Asus but when I attempted to install it the motherboard said, “Not a proper BIOS”.  So back to the drawing board on that one.

 

Ticket System Upgraded

     osTicket had been upgraded from 1.10 to 1.10.1, not the versions I thought we are on, but it’s the most recent stable version.

     They’ve apparently moved things away from open source and github now has old versions rather than the current versions.  Not happy with this move as it gives community less opportunity to contribute to it’s development.

Tickets

      Please refrain from creating trouble tickets until further notice this evening.  In the meantime please either e-mail support@eskimo.com or call 206-812-0051.

     I will upgrading OSticket this evening.  OSticket has completely changed database schemas and so I will not be able to carry over old tickets.  However right now there are no open tickets which makes it an ideal time to do this update.

All Services Moved Off of Failing Hardware

     All services have been moved off of failing hardware.  There should be no further unscheduled interruptions.

     There will be some interruptions of services lasting about 20-30 minutes per service between 10pm-6am over the next few days in order to make backups of the newly configured machines so that if there is a failure restoration returns to the current configuration.

Sick Host Machine

     I think the motherboard is damaged in one of our hosts that holds the mail spool, mail.eskimo.com, mx2.eskimo.com, debian.eskimo.com, and scientific7.eskimo.com.

     In addition to random reboots, the machine is sometimes taking disk errors but the smart status does not show any issue with the drives, no errors recorded, which leaves the controllers which are on board.

     So tonight various services mentioned above will be down for a period of time as I move them off of this failing machine so I can take it out of service and replace the motherboard.

     When the BIOS lost it’s fan settings resulting in the shutting down of a chassis fan, it got quite warm, but it’s hard to say if the heat damaged it, or existing damage caused the BIOS to lose it’s settings.

     At any rate I am moving things off so I can take it out of service for several days to replace the motherboard and then to burn it in properly (extensive testing with mprime, while monitoring temperatures, etc.  This is both to make sure it is stable and to find the minimum voltage the CPU will operate stably on.  The lower the voltage, the less the heat. and the longer the life.