Done with reboots for the night.
These reboots fixed various software flaws that Wikileaks revealed the CIA used to gain access to Linux systems.
In addition, I took some time to debug some of the systemd start up scripts which is why I rebooted some of the servers several times. There are still MANY bugs in these scripts to fix but a lot are fixed in Ubuntu 16.10, and probably even more in 17.04.
I am working on upgrading the failed server although I haven’t decided which CPU to go with yet, either an i7-7700k, a lot like the i7-6700k except less heat, or a Xeon of some sort. In the past I’ve avoided Xeon’s because the registered ECC memory was both expensive and slow as snails. No point in a fast CPU if the memory system can’t keep up. But now they have Xeon processors with a memory system based upon the X99 chipset that use normal DDR4 memory and flies. So considering going that route.
When I get the server together with the new hardware, I will try loading 17.04 on it, and if it works well, upgrade the existing machines.
I am going to be rebooting all of the Intel based physical servers as well as the virtual servers to load new kernels to address security issues as well as to make sure that no old code is running in the face of hundreds of upgrades which address various security concerns relating to CIA hacking recently revealed in Wikileaks documents. This will result in brief interruption of all services.
When the mail spool server failed the other day, that machine also was one of the NIS slaves. I failed to take it out of the client mail server, mail.eskimo.com, yp.conf file and as a result at some point ypbind tried to rebind to a dead server.
During that interval, anyone who tried repeatedly to authenticate, and Macs and I-Phones do this automatically, got locked out by fail2ban for failed authentication attempts.
I discovered this today while trying to resolve a problem for my mother-in-law that also replicated on my tablet (but not the workstation that I use all day, go figure).
This has been corrected, all the banned IP’s unbanned. If anyone else is still having a problem contact me, 206-812-0051, and I will chase it down.
Tonight, close to midnight, I will be rebooting our primary web and ftp server. This will take about five minutes.
The reason for the boot is to load a new kernel that addresses some security issues.
The reason it takes five minutes is that this is a 32GB machine setup to make extensive use of caching in order to provide the fastest possible web response.
This means there is often much to be written to disk before a boot and that, even with fast modern drives, takes some time.
Some of our dial access numbers in the Sacramento area are O1 numbers. I received this, not the first notice regarding this outage but thought I would post publicly in case others are having issues:
Dear O1 Customer,
Currently there is an outage affecting LATA 726 (Sacramento). Dialup users may experience busy signals, or trouble connecting while maintenance is being performed.
We apologize for this inconvenience and appreciate your patience in this matter.
As with all technical contacts, if you experience any issues that you believe may be related to the above message, or have any questions concerning this email, please contact our 24/7 Network Operations Center at (888-444-1111),OPT #2 or via email at firstname.lastname@example.org.
Since you are not a direct O1 customer you can not use these contacts to get help, but here are two Sacramento numbers on an alternate network you can use if the primary access numbers in Sacramento are down, 916-282-0155 – Sacramento (Main), and 916-609-0155 – Sacramento (North). These are both MegaPOP numbers and so should be unaffected by this outage. I believe this relates to an earlier fiber re-route being made necessary by flooding.
I am about a month and a half behind in accounting, getting notices out, expiring accounts, etc. If you know your account is near time for renewal feel free to contact me and I’ll be happy to look up the information for you. As account expirations come up, if you have not yet been notified your account will not be turned off until you’ve been notified of payment due and given an appropriate amount of time to respond.
This has come about through a number of things happening, failed equipment screwing up my sleep schedule, minor illnesses increasing sleep requirement, and a lot of software issues being discovered and need to devote more time to securing things.
Actually Eskimo did not crash, something just went wrong with sshd. I was able to login to the console fine.
Eskimo crashed, even though I’ve moved it to different hardware (I am still trying to repair the old), so I will head down to the co-lo in a bit to reboot.
Scientific Linux is slow since moving to new hardware. It’s not load or speed of the new hardware that is the issue, Centos6 is served off of the same hardware and does not have this problem.
Both are derived from the same Redhat 6.8 code, but I’m getting some weird errors on Scientific that I am not getting on Centos6. I’m going to try recreating the virtual machine in a bit and if that does not work perhaps reload it from scratch. I did have to do this with Centos6, (shellx became so corrupted it was unmaintainable). The RPM system that these Redhat derived operating systems use tends to be easily corrupted and difficult to fix beyond a certain point.
But this problem seems to relate to the virtual machine as it worked fine on the old host.
I’ve moved all the services off of failed hardware but because some machines still have file systems mounted off of there and will not let me umount, not even with the force option, I’m going to have to reboot a bunch of stuff to get that corrected so there will unfortunately have to be further interruptions although they should be relatively brief.