Kernel upgrade for virtual private servers 1-8 and the host machine, “ice”, have been completed. All are now running 5.10.33. Hopefully stability will be better.
We have been having some stability issues on the machine that serves private virtual servers vps1-vps8 since kernel 5.8 where there were some major changes that significantly improved load and responsiveness and hence the unwillingness to revert to 5.4. I have filed a bug report on various issues that have shown up in crash dumps with the kernel Bugzilla but so far they have not directly addressed them.
This morning, this server spontaneously rebooted, it did not print a crash dump, only a watchdog timer timeout. But since last Friday the kernel team has kicked out a new release that addresses an issue with memory allocation that under certain circumstances could cause already allocated memory to be re-allocated and they’ve made many fixes to KVM code which is the hypervisor we are using to serve private virtual servers so could be relevant.
I am going to do a kernel upgrade on just this server at 11PM tonight. This will cause a brief interruption of all virtual private servers lasting 5-10 minutes. Since no NFS/NIS mounts are involved, I do not expect it to take as long as normal updates do.
Maintenance is completed and everything is back up and running. We had some issues with both the mail server and the web server not automatically mounting all disk file systems and swap partitions so I had to go around to each machine and check these things.
E-mail and all web hosting is operational.
Shell servers are all operational.
We will be performing another round of kernel updates on April 22, 2021 starting at 11pm. Also at this time we’ll be taking mail down for some additional time in order to increase the amount of RAM. This should only take a few additional minutes. The logs are indicating some memory pressure during peak times, not enough to result in swap usage but enough to reclaim buffers early which hurts performance.
All of our hosting, shell, and email servers will be affected between 11PM and approximately 11:30PM though the individual downtime for any one of them should be ten minutes or less.
I made the announcement some time ago that I was going to move nextcloud. This is because in it’s old location it’s .htaccess conflicted with WordPress .htaccess when it came to well-known web addresses. To resolve this, I gave NextCloud it’s own hostname.
The OLD URL was: https://www.eskimo.com/nextcloud/
The NEW URL is: https://nextcloud.eskimo.com/
Please update your devices and any links to reflect the NEW URL. Some devices may not properly follow the redirect.
I’ve also updated to version 20.0.9 which is the current stable release. I am also installing and configuring a bunch of new applications including connectors to other fediverse social media.
I am planning another kernel upgrade this Friday April 16th starting at 11pm. If all goes well it should be concluded by 11:30pm. If it does not, it may be as late as 12:30pm.
There is some issue with the current grub that is sometimes causing the update after installing a new kernel to fail requiring manual intervention.
One of our servers rebooted at 2:30AM with no oops, crash dump, or error. I thought I had crash-dump enabled but did not, so corrected that, also upgraded to the latest point release of the kernel.
This server has most of the private virtual machines so is the reason your virtual machines would have rebooted.
It was not a clean upgrade. Grub got corrupted on one machine, one server didn’t properly export the exports until I manually restarted the nfs server. There were quite a few broken NFS mounts. So, some things were back up at 11:16, and the broken systems were gradually restored with restoration completed at 12:07.
I plan to upgrade kernels on all the servers tonight starting at 11PM. Upgrades should be concluded by midnight. This will result in outages lasting 10-20 minutes for each physical server and perhaps another ten minutes for virtual machines on that server depending upon whether NFS and NIS properly bind after reboots or not.