System Issues Today

     At some point libvirtd on igloo, the machine which hosts mail and a number of shell servers, failed.  Libvirtd is the server side virtualization management daemon, it is responsible for starting, stopping, arranging networking, storage, and system resources for kvm/qemu guests (also for xen but we aren’t using xen here).

     This affected a number of machines including mail and because every server NFS mounts the mail spool from mail, it affected them indirectly.

     The message that Igloo gave in syslog relating to libvirt was:

        libvirtd[2271]: internal error: wrong nlmsg len

     The “nlmsg” refers to Netlink, so it would appear something went wrong in networking and libvirtd didn’t know how to handle it and crashed.

     I don’t know exactly how long and how deep the outage was since it was kind of a gradual deterioration situation after libvirtd crashed.  I was going to add an automatic restart to libvirtd in systemd to prevent this specific failure in the future but found it was already in place but incompletely specified so perhaps systemd choked.  I have corrected that.

     I received about eight tickets on this issue, and I really appreciate it that the ticket system is being used, but also with outages of this magnitude a phone call would be good because if I’m not actively at the terminal I may not be aware of issues.

Rust

     Rust is a new compiled programming language that users a new memory
management scheme.

     I first learned several assembly languages and then learned C, and because I learned assembly first and thus really think in terms of what the hardware does, I have not had issues with array bounds or de-referenced pointers but a lot of people have. In fact this tends to be what causes the majority of privilege escalation exploits.

     Many languages, Java, Python, Perl, BASIC, etc solved this by using a memory management technique known as garbage collection but this method has severe performance issues. First, it can be difficult for the language to determine if a particular variable will ever be accessed again, thus memory release may be very delayed resulting in wasted memory. But more significant is that garbage collection causes periodic halts in execution that can be very annoying.

     Enter rust, they invented a new method of memory management in which you declare to the compiler how memory is used, in what contexts and time frames, and this enables the compiler to manage memory much as you would do by hand without the human error component.

     This makes Rust an ideal replacement for C, for those who are less disciplined, and for critical tasks, because, like C, it can approach assembler in efficiency, doesn’t introduce the periodic lags of garbage collection, and yet protects you against buffer overruns and pointer de-reference errors.  Now if they will only invent a text editor that corrects run-on sentences.

     I’ve installed the rust compiler rustc on all of the shell servers and working on installing it on the other machines as it will be necessary in the future for kernel compilation.

     The newest version is on Fedora and Rocky8, 1.80, slightly older versions on Ubuntu, and Zorin, 1.75, and even older versions on Debian and MxLinux 1.65. 

Brief Web Outage 14:24-14:27 July 17th

     The brief web outage today lasting approximately two minutes was to apply a security update to the Apache server bringing it up to 2.4.62 owing to vulnerabilities found in the previous version.

     At the same time, I also upgraded the kernel to 6.10.0.  The 6.10 kernel has some improvements that speed up encryption.

Fedora and Rocky8 Info

     Some update pushed on Rocky8 and Fedora broke rwho and ruptime on those machines.  They will still provide user status to other servers but are no longer pulling other servers for rwhod.

     Further, ruptime requires the ‘daemon’ command which is no longer available on these machines.

     At some point NIS will disappear from Fedora and when it does we will be forced to retire the machine at that time.  Because we don’t know when this will happen we can not predict this date.  Therefore we recommend NOT relying on that machine for anything.  If you have cron jobs on it, please move them to rocky8 or if you do not need a Redhat environment, one of the other servers.

Don’t Buy Epson!

I am a little bit more than pissed off at what Epson pulled. I had a WF2950 all-in-wonder inkjet printer / scanner / fax. It isn’t officially supported under Linux but it mostly works. The mostly being the need to boot windows for Firmware updates. Today it just stopped working, the scanner wasn’t seen anymore in Windows or Linux, but there was a firmware upgrade available. So I installed it. After doing so the scanner was seen again in Windows but not Linux, however, after the firmware upgrade it refused to do anything except complain about the non-Epson ink cartridges I have in it. These worked just fine prior to the upgrade. If they think I am going to pay as much as the friggin printer for ink that last about ten pages they got another thought coming. Previously I have gone with HP, and although they always tell you a non-HP cartridge is installed they always have worked fine in spite of complaining. However mechanically they aren’t the best built, frequent issues with scanner parts breaking. But hell if I’m going to be held hostage by Epson. Once you pay for something you don’t expect them to take away functionality with firmware upgrades but that is exactly what they did, so EPSON, FUCK YOU! I bought another HP!

This shows where the Asrock X299 steel legend motherboard failed.  The i9-10980 CPU ran for ten hours then hung.  I power cycled the machine and smoke came out.  Here is where it originated from, the backside of the motherboard under the cooler backplate.  The name goes on before the quality goes in.

Asrock X299 Steel Legend solder back side by cooler backplate.  Quality solder job went up in smoke.

Piece of Shit

New Server – ARRRRRrRRRRRrrrrggggghhHH!

I’m on my 4th MOTHERBOARD with the new server, this one I got running last night, it ran all night but crashed at 10:17AM, not the end of the world as I’d not completely dialed in the operating parameters yet, but I powered it down and back up to reboot since it was hung and a puff of smoke came out and that was that.  Some component failed in a spectacular manner.

I have since learned that the Gigabyte board did not boot because it needs a BIOS upgrade for this CPU, so I’m going to re-install it, install the BIOS upgrade and send this Assmoke back to Asrock for repair.

New Server Still Broke

     Replaced the Asus Prime X299 IIA board with an Gigabyte X299 Aorus Master, this board won’t even post.  Says bad memory, but I put in just two modules and no matter what two I tried it would not work, then I tried some Crucial memory I had on hand, it won’t post either, so there is something not right with the motherboard, and it was the last one they had in stock.  So I ordered an Asrock motherboard, maybe third time is a charm.  I ordered it two day delivery so should be here Saturday.  In the meantime, this server is going to remain out of service.

     This means Roundcube, Manjaro, Friendica, Hubzilla, Mastodon, and Yacy will be out of service until I get a functional motherboard.  To say this is frustrating is the understatement of the year.

Rocky8

     Since elm successfully compiled on Rocky8, I decided to try metamail.  It successfully compiled as well and is now installed and linked into elm.

     Also, this is a reminder that friendica.eskimo.com, hubzilla.eskimo.com, mastodon.eskimo.com, yacy.eskimo.com, and manjaro.eskimo.com will all be going down for probably around six hours tonight in order to replace the motherboard.

     The motherboard that is presently installed has two dead memory slots so a quarter of our system memory is not recognizable on that machine.