Just a reminder that I am going to bring a server down to replace two failed hard drives tonight. This will affect https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://eskimo.com/ (but NOT https://www.eskimo.com/), and the shell servers Manjaro.eskimo.com and Debian.eskimo.com. Estimated downtime is about an hour and probably will start between 10:30 and 11:30 just depending upon when I get there.
Web Restored
Everything is back in service in terms of the web service here. For some reason when I saved all mysql databases prior to changing the innodb table options, it did not save mysql (the database with all the permissions and grant tables) properly. So I had to ressurrect a backup, dump the mysql database from it, and re-import it, but that worked and now everything is back.
Partial Web Outage
Database is taking longer than I expected to recover.
The issue was caused by the fact that innodb-file-per-table was not set to true. This caused all the innodb tables to be stored in the system file. The problem with this is that space is never recovered and the file grows until the disk is full and then the database crashes. For this reason, Mariadb has shipped with this set by default after version 10.2 (we are on 10.6). However, Ubuntu, in their infinite wisdom, ships this with the distro with it NOT set. I’ve been bit with this before when I installed the last server so should have known to check but it’s been a few years and memory cells are aging.
To fix this I first needed to copy the entire /var/lib/mysql directory to a larger disk since I can’t even start the database with disk full. With more than 600GB of data, this took a while. Then I’ve got to dump all the databases, and there is around 160GB of legitimate data to dump so this also takes a while. After that I can delete the ibdata1 file and log, then copy all the remaining system files back to the original disk, then with this configuration option correctly set, restore the database from the dump. There is no shortcut, without losing data, that I am aware of.
Partial Web Outage
We have a partial web outage now.
This was caused by a misconfiguration of the mariadb database on the new server that caused it to eat itself. I am in the process of correcting that configuration issue and restoring the server. Unfortunately, owing to the size of the database, this will take some time. Perhaps an hour or so.
Wednesday Evening / Night
Sometime Wednesday evening or night I am going to take the new machine, Inuvik.eskimo.com down, to replace the two drives I had intended to use for backup space as both have suffered a head crash at some point and have media errors that are too numerous to map-out with two spare tracks provided. Because when used in a raid, you can’t use the Linux defective i-node to map them out manually, they are of no use to me, but they may be for someone else as they are 4TB drives with perhaps 300 512 byte blocks defective each. They are seven years old so out of warranty (warranted for five years).
This new machine presently has https://friendica.eskimo.com/, the Debian shell server debian.eskimo.com and the Manjaro shell server, manjaro.eskimo.com on it so these will be out of service while the drives are replaced.
So if someone wants a couple of 4TB with about three hundred bad sectors that you need to manage manually let me know. Preference will be given to Eskimo North customers, and then after that first come first serve. Preferably, be close enough to Shoreline, WA to pick up, else be willing to pay for shipping.
Database
Go ahead and make changes, got more work to do before ready to switch over.
Database Migration
I am migrating the mariadb database (you may know it as mysql) from the old server to the new server tonight. Please do not make posts or other database changes that you can not afford to lose tonight. There is 117GB in the database, so this is not something that can happen instantly.
Unplanned Outages
Sorry for the unplanned outages of Debian and Manjaro tonight.
As I mentioned earlier, I had moved these two to the new machine primary because I have to build kernels on both, on Debian because Debian signs it’s kernels and won’t load an unsigned kernel built on a non-Debian machine, on Manjaro also because it’s kernel environment is unique. So moving them to this machine reduces the time it takes to build kernels.
Well, I was installing software to get it ready for it’s main function which is web applications, and I had ufw installed, and accidentally installed avahi, autoipd, ppp, and firewalld, and somehow this broke networking.
While I was at the co-lo fixing this stuff, I also changed some configuration to make the system NOT depend upon disk-by-uuid because as useful as this feature is, it is not reliable and results in failed boots.
Debian, Majaro, Kernel Upgrades
Manjaro.eskimo.com is now operational again, including Mate Desktop and workspace switcher.
Debian.eskimo.com is now fully upgraded to Bookworm.
We will be doing a kernel upgrade Saturday July 8th starting at 11pm.
This upgrade will require reboots of all servers and hence interruption of all services, paid and free, including mail, web hosting, shell accounts, and virtual private servers, and https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, and https://yacy.eskimo.com/.
Most of these services will not be down longer than about ten minutes except for https://yacy.eskimo.com/, because it takes about 45 minutes to rebuild a database. All services should be back up by midnight.
Debian Upgrade – Still In the Works – And More…
The upgrade from Bullseye to Bookworm is turning out to be a MUCH larger upgrade than was the upgrade from Buster to Bulleye, more than 8200 packages are being updated so it is taking much longer than expected.
Then, when it is finished, I am going to try to move this virtual domain off the existing server to the newly built server because this is one of the hosts I build kernels on owing to the necessity of them being signed for Debian and the new machine builds a kernel package almost twice as fast as the one it’s presently on. So when it gets done there is going to be some downtime while I copy the virtual machine between machines.