Everything was back up by 12:35 except the machine I use for system accounting. For some reason the virtual host network interface would not work. I ended up deleting and re-installing the network interface, that finally got it to talk with a new Mac address. Don’t really have any idea why the old one stopped working but after an hour and a half of hair pulling I got it running again.
The mail server is fixed, though I still have to check NFS mount points of various servers, however, the CAUSE of the issue with mail was not, as I had suspected, the new kernel.
Something has changed the behavior of the system such that the order of data in nsswitch.conf is being ignored, and if there is a conflict in NIS UID verses local password file, the NIS overrules. As it happens, we had a postfix entry in NIS from a gazillion years ago when postfix was running on SPARC servers and that was conflicting with the local password file. The order of data in nsswitch.conf was files then NIS so it should not have, and further it did not show in ls, the output of ls was correct, but postfix interpreted the NIS version.
I booted off the old kernel and it still exhibited the same behavior so it was not caused by the new kernel as I had suspected. Removing the entry from the NIS database, which is no longer legitimately used anyway, resolved the issue so that the server works properly, still the issue remains that nsswitch.conf is being ignored in some contexts and this is not good.
I am having problems with the mail server being unstable with the new kernel, I am going to try 5.13.5 (came out since we installed 5.13.4) to see if it resolves. This will take about 20 minutes to compile and install.
Going to do another kernel upgrade this Friday starting at 11PM providing a crash doesn’t force one sooner.
I didn’t want to move to 5.13 so early in the release cycle, actually didn’t want to move to it at all but was forced to by the Sequoia exploit.
There are lots of bugs, most of them minor networking bugs affecting outlying cases but a few that are relevant and some memory leaks. This early in the release cycle this is unfortunately the norm.
This upgrade will not require any futzing with drivers since it is only a minor point release so everything should get done by 11:30PM or so. Might be some NFS issues to clean up although after I last rebooted the mail server ALL of the NFS clients rebound properly so I think they are finally getting NFS stable.
It has been a long grueling upgrade, made particularly difficult by one server having drivers not officially supported under 5.13.x kernels, but I got it working.
All NFS mounts and NIS bindings have been verified.
Another issue this time around, at some point some bug(s) have been introduced into systemd causing it to be unreliable at starting some services during boot. But everything is up and running now.
I’m behind in getting receipts and expiration notices out but should be able to get caught up on that this weekend providing no other major disasters. Thanks for the folks at QualSys for not waiting for the kernel folks to really adequately test things before releasing an exploit for the Sequoia exploit (and yes I do mean this sarcastically).
I have removed all tickless kernels prior to 5.13.4 because they are vulnerable to a recently announced root exploit that goes by “Sequoia”, CVE-2021-33909.
5.13.4 tickless kernels are available for all Debian and now for all Redhat based Intel and AMD x86-64 systems (any 64 bit Intel or AMD CPU). Previously I only made ‘.deb’ packages available but because I had some Redhat based systems for which the distributor has not made 5.13.4 available yet, I compiled for this platform as well.
There are two kernels provided. The ‘client’ kernel meant for home systems and work stations, optimized for low latency, it is 1000HZ clock tickless (tickless means clock interrupts cause context switches only when there is work scheduled, this saves a lot of wasted CPU cycles) and fully preemptive. The ‘server’ kernel is meant for servers and is optimized for maximal throughput and has a 100HZ clock and is non-preemptive. It is also, like the client kernel, tickless.
On servers, the tickless feature is particularly useful where a large number of virtual machines are hosted as each machine adds to the host CPU load and all those clock interrupt reschedules add up to a lot of wasted CPU cycles.
On laptops the tickless kernel is useful because saving CPU cycles extends battery life.
To install these kernels, first download ALL three ‘.deb’ files for Debian based distributions or both ‘.rpm’ files for RPM distributions located in the client or server directory. You can download via the web at https://www.eskimo.com/kernel, or via ftp at ftp://ftp.eskimo.com/pub/kernel using anonymous login. If you are behind NAT, to use ftp after you login type passive to put the server in passive mode.
Then on Debian based systems, install with dpkg -i *.deb. On Redhat based systems install with rpm -i *.rpm. If on Redhat you get a complaint about headers conflicting with existing header package, remove the existing package with rpm –nodeps -e existing (whatever the existing package name is). The –nodeps is important here otherwise removing the headers will remove some 300-odd dependent packages.
If you run into any issues, please generate a ticket at: https://www.eskimo.com/support/osTicket/. Thank you.
All public facing servers except the private virtual servers have been upgraded to 5.13.4 which has the Sequoia exploit fixed.
I will work on getting the virtual private servers done today, so your vps if you have one here, will be rebooted some time today.
Tonight 11pm I will upgrade the file servers. I hope this will be done within an hour but may take longer than usual owing to special drivers involved in the iomemory flash used for our MariaDB.
I’m not sure what broke manjaro but it will not boot into a mode where I can login, not even console, so am restoring from backup.
Because of the severity of the Sequoia security bug, kernel upgrades for shell servers and the web and mail servers, basically anything exposed to the public, will happen tonight instead of Friday as originally planned, and non-active servers will be upgraded sooner than 11pm. The downtime for these will only be about 1 minute each. The file servers that do not have public exposure will still be upgraded tomorrow. VPS’s may or may not get upgraded this evening.
There are some worthwhile fixes to the kernel and as a result will be doing a kernel upgrade on July 20th, 11PM-Midnight. Most services will experience an outage of about ten minutes during this interval.