Kernel Upgrades and Expedited RCU CPU Stalls

     From kernels 5.15 forward, we’ve had issues with expedited RCU CPU stalls on our servers.

     I’ve experimented with kernels configured per the stock Ubuntu configuration and these same kernels do NOT show expedited RCU CPU stalls.

     The RCU system is Read-Copy-Update, it is a means to allow read concurrency with updates without requiring a lock resulting in greater efficiency in modern multi-core CPUs or in multiple CPU systems.  If you are interested in details read https://www.kernel.org/doc/html/latest/RCU/whatisRCU.html.

     The kernels I will be putting in place tonight are 5.19.12 and in a few cases 5.19.11 (I started the update working with 5.19.11 then kernel.org came out with 5.19.12), with a configuration closely resembling the Ubuntu “generic” kernels, which is to say it won’t be entirely tickless, only idle tickless and it won’t be entirely non-preemptive, but will allow voluntary preemption.  This is less efficient than our normal kernels but a weeks worth of testing has shown it to be stable on four of our busiest servers.

     I will be further testing a kernel that is voluntarily preemptive but completely tickless on the four busiest servers to see if that is stable.  I’ve been testing this configuration on my workstation and other than higher latency you get with non-preemptive kernels, it has been stable.  If this works out we will adopt this configuration on the rest of the servers next Friday.

     This will affect all Eskimo services, shared web hosting, shell services, e-mail, virtual private servers, as well as our Fediverse services, https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, and http://yacy.eskimo.com/.

     The updates will begin at 11PM and should be completed by 11:30PM Pacific Daylight Time (GMT-0700) tonight Friday September 30th, 2022.  No single service should be down for more than about ten minutes.