This evening I had to reboot a server that was the server that currently all of the files are served from because updates required it. It did not boot. It took me three hours to get it to boot. Something in new nvidia drivers conflicted with kernel-nfs-server used to serve those files and the system purged the latter including the /exports file that tells it what machines to make the files available to.
I have solved the video conflict by removing the nvidia drivers and using the Linux nouveau drivers instead. They are slower but it’s not like I am going to play video games on this machine. They are adequate for everything else.
I am now rebuilding the exports file by hand. Hope to have everything operational again in a couple of hours. I will focus on the most used services first.
Good news! I successfully got SunOS 4.1.4 to boot in a qemu sparc emulation of an ss-10. I was told SunOS 4.1.4 would not boot under OpenProm but apparently they’ve made some improvements because it did. It spewed a number of errors but none of them were show stoppers. I haven’t installed it yet but this is the first time I’ve even got it to boot. It’s been so long since I installed SunOS that it’s now mostly a human memory problem not a machine problem but I’ll get there.
I’ve got three remaining Sparc machines, an LX that is substituting rather poorly for the SS-10 that is eskimo.com. I still would much like to find a good SS-10 chassis with a non-fried DMA chip or an image of the SS-10 ROMS so I can try to get Qemu emulation to work, the latter would really be the preferred solution since then I would not have to maintain antique hardware. So if anyone can help with either of these things it would be much appreciated.
Another advantage of getting Sparc emulation working is that these machines would then become virtual machines and I could remotely reboot or power cycle them or perform other operations that as physical machines requires me to drive 22 miles to perform.
I got NIS working except one Radius server won’t bind to them and I can’t login remotely to fix it as it is a physical machine and not a virtual. I would like to fix this but so far have not been able to get the modern Radius to work. Documentation says it will read the old style configuration files but I’ve not been able to get it to function. The examples they give are unfortunately much simpler than the insanely messy setup we have here which is mostly the result of dealing with a large number of dial-up and DSL providers in the past (now only a handful remain).
We are presently running less one important server and so the load is still being shared by the remaining two is higher than normal.
When I left, things appeared to be working but when I returned, NIS was not functional.
I still can’t seem to get one of the NIS servers operational and haven’t figured out why yet but one will allow things to operate.
I have to get some sleep but I will troubleshoot the other NIS server and get the other file server back on line soon so any delay or lag is temporary.
Tonight, Wednesday Night to Thursday Morning, the 24th and 25th:
Going to be returning one server to service, shuffling services around and clearing another server so I can do an operating system reload on it. Unfortunately the Zesty update has failed more often than succeeded and in cases where it seems to have succeeded I’ve had networking problems afterwards. So rather than risk it I’m going to clear one server of all services, reload it, then redistribute all the services to balance loads. This will result in outages of various services lasting from 15-45 minutes each.
There will be some slowness in the mail system and shell servers today as I am making backups of all the virtual machines in preparation for shuffling them around when I return another server to service.
Machines on Igloo include:
- FTP (which is also the web server)
- Mx2 (one of two incoming mail servers)
- Radius1 (a machine used for Radius authentication, not yet in service)
- vps3 (a customer virtual machine)
- vps4 (another customer virtual machine)
The crash didn’t break these but while I was at the co-lo I decided to bring Igloo’s host operating system up to snuff. The latest Ubuntu fixed a number of systemd scripting problems that formerly made remote booting impossible (the scripts would hang at target shutdown reached). I also suspect that it fixes a number of hacks that they have for obvious reasons not publicized. So I brought it up from 16.04 to 16.10, the first step, and all the virtual machines broke.
The reason why they broke is that Canonical decided to change the XML machine names and not bother to tell anyone. I discovered this by deleting and re-installing one machine and comparing the resultant XML file. Once I know that it was a simple edit to fix the others.
Now I am doing the second phase of the upgrade which brings 16.04.2LTS up to 17.04. When it finishes it will require another reboot and who knows what else they might change that it will break.
Igloo, which is a physical host machine that hosts a number of virtual machines including the web server as well as home directories, crashed. This is the first time it has ever crashed since it has been an i7-6700k based system (more than a year). Seems to be a pattern lately. At any rate, there was some corruption to MySQL but I believe I’ve got that straightened out and I am upgrading the server as long as I am down here which will require two more reboots before it is finished.
Scientific is now down for maintenance. It will be down for approximately 1/2 hour. All other shell servers and other services remain available.