I made a bad error tonight and accidentally powered down one of the virtual host machines and had to take a trip to the co-location facility 22 miles SE of where I am to go power it back up.
As a consequence, I did not get all the work done I had intended to complete tonight so I will be continuing tomorrow between 10PM and 2AM.
I will be rebooting all EL6 based servers, this will include Iglulik, the main file server and server for some virtual machines, Virtual, another virtual machine server, mx1 and mx2 incoming mail servers, mail, shellx, scientific, radius1 to load a new kernel that fixes a new exploit that doesn’t lead to privilege escalation but can allow someone to remotely crash a server with properly crafted packets.
I will also be imaging these machines while they are down so that if we have to restore a system the restoration image will have the fixes for Ghost in place. If time allows I may take some other machines down for imaging as well.
Because Iglulik has the user files and mail spool, I will be rebooting it just after midnight and during this time everything else will freeze. This process takes about 20 minutes.
Servers which are replicated will be done earlier in the day. The shell severs will start at 10pm, and I will take one at a time so others will be available for your use when any given server is being worked upon. I will do the client mail server and web server when I’ve completed the shell servers.
We suffered a denial of service attack this morning that used DNS query packets with source addresses forged as our mail servers to cause our fail2ban scripts to firewall our DNS servers from our mail servers. This started around 7:30AM and caused intermittent inability to receive mail until I was able to modify the fail2ban configuration to ignore these attacks and restore DNS service to the mail servers and with it incoming mail. The servers rapidly processed the backlog and service was restored to normal by approximately 10:30AM. The configuration is fixed so that this particular mode of attack is no longer possible.
All machines which were vulnerable to the ghost exploit have been updated and rebooted.
There will be some brief interruptions of various services including the shell servers because of a serious vulnerability in glibc.
I will be applying updates to various systems as they become available and then rebooting so that old code no longer runs.
This will cause a brief interruption of all services. The main file server and host machines take longer to reboot so I will do those after 10PM tonight. There will be about twenty minutes between 10pm-midnight where virtually everything grinds to a halt while this is done.
If there is any interest in resuming user meetings in April, we need to find a location. Also what time and day of the week would people like to meet?
Please join the discussion regarding a new meeting sight on our forum.
I’m sick and have little to no voice, sometimes just a whisper, sometimes I can manage an intermittent frog voice with great difficulty.
If you need help and it’s something that can be handled via e-mail to support or fax, that’s much preferred at the moment. If you need to send card information, fax is best, or login to webmail here and send to support from webmail, be sure to use https not http.
Start time: 11:00 pm PST
End time: 1:00 am PST
Affected: ATM Terminations on Seattle Redback
Maintenance is being performed in order to move the Redback to a new rack
within our space at Colo centers Estimated downtime for this is 1 hour while
the rack is moved and re-wired. Some affected customers will need to reboot
their equipment to restore services.
This will affect Western Washington DSL customers in CenturyLink territory.
One of the physical hosts wedged during a copy of a virtual machine. The only shell servers available at present is shellx.eskimo.com and eskimo.com.
I may have to boot and run a file system check on the other as well, so everything may be down for about 20 minutes probably about 45-60 minutes from now.
There may be some points where things are a bit sluggish today as I migrate some virtual machines from one box to another. This involves copying images around 100GB. With the old 100mb/s switch this would pretty much stop things. I’m hoping not with the 1GB switch, still it’s going to task disk I/O and other resources on the machines pretty heavily.
The purpose for migrating these is for load balancing and to provide better redundancy when physical hosts are down by spreading functionality across multiple physical boxes.