We will be upgrading to kernel 5.19.4 this Friday at 11PM. I don’t know if this will address the CPU stalls or not. The number of changes to the kernel since 5.19.3 are so voluminous that I don’t have time to review all of them to see if any of them impact this problem. I do know that the issue we had is not apparently in the KVM code itself as it has also occurred in the physical hosts and in at least four different parts of the kernel, so probably a function all of these are using or something doing wild writes in memory. The ticket I opened is still not resolved. But since we know that this problem exists at least from 5.19.0-5.19.3 and going back to 5.17 is not an option since it’s past EOF and there are some serious exploits not fixed in that kernel, moving forward is the only viable option at this point.
The expected interval is approximately 1/2 hour with no single service being down for more than about 10-15 minutes.
This will affect ALL of Eskimo’s host services including virtual private servers, web and e-mail hosting, and our fediverse services https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, and https://yacy.eskimo.com.