Inuvik

     Our Inuvik server which hosts manjaro, friendica, hubzilla, mastodon, and yacy, proved to be unstable under 6.11.4 as it was on 6.11.3, so I am headed over to the co-lo facility to reboot back onto 6.11.2 which is stable.  Estimated return to service time 01:15 Pacific Daylight Time.

Another Machine Died around 4AM

     This machine was being our router, it is the only machine aside from Inuvik that has more than one Intel NIC.  So presently Iglulik is playing router but it only has one Intel and one Realtek interface, the Realtek is supposed to  be capable of 100, 1G, and 2.5G but the Linux drivers for it are seriously broken and only function at 100mb/s hence we are at 1/10th our normal network speed.

     The newly broken machine houses some shell servers and all the private virtual machines, and it died too rapidly to copy any data off.  This is the machine that I just replaced the failed drive in the other day.  There are indications that it may just be a bios battery, it will not save settings nor let me load defaults.  So I’m going to start by changing that.  If that does not fix it, then I am going to put that drive into my workstation, transfer all the files onto a 2TB flash drive and take them back to the co-lo to load onto the remaining machines, also stop by RE-PC and pick up another Intel NIC card.  And then go from there.

     Right now the following shell servers are operational:

        popos, rocky, fedora, debian, mxlinux, and manjaro.

Inuvik Too Hot

    Inuvik is running too hot.  This machine was running at 4.8Ghz small fft torture test 36 threads 2/threads per core before I brought it over to the co-lo but it is exceeding 96C now but only on a couple of cores.

     When you have a couple of cores running hot on a multi-core CPU but the rest are normal, this is usually indicative of an air bubble between the CPU and cooler so part of the heat spreader is not receiving cooling.  This is more pronounced with the i9-109×0 series of CPUs because the heat spreader is soldered to the die.  On most microprocessors there is thermal compound between the die and the heat spreader. This creates some diffusion that does not occur when the die is soldered to the heat spreader so any air bubbles are more critical.

     I’ve ordered some more Kryonaut Extreme which should get here between October 1st and 3rd, at which time we will pull the machine from the co-lo for a few hours to clean the CPU and heat sink and re-paste it.  I will perhaps be just a smiggin’ more generous with the paste this time.  I am stingy not because of cost but because no matter how conductive thermal paste is it is less conductive than the metals you are trying to transfer heat between so you want as thin of a layer as you can get away with, but the worst thermal paste is better than the best air so a little too much is less bad than not quite enough which appears to be the case presently.

     Between now and then I’ve reduced the speed of the machine from 4.8ghz to 4.4ghz and CPU voltage from 1.37 to 1.2v to reduce heat generation.  This will reduce performance by slightly less than 10%, but give it’s around 97% idle time on the CPU’s this should not be a problem and it’s only temporary.

     Right now this is more of an issue than it otherwise would be because there exists a bug in the kernel code when it writes to the MSR to change the CPU speed in response to excess temperature.  If this bug did not exist the machine would simply have automatically downclocked, but this is a current bug affecting these particular CPUs.

New Server – ARRRRRrRRRRRrrrrggggghhHH!

I’m on my 4th MOTHERBOARD with the new server, this one I got running last night, it ran all night but crashed at 10:17AM, not the end of the world as I’d not completely dialed in the operating parameters yet, but I powered it down and back up to reboot since it was hung and a puff of smoke came out and that was that.  Some component failed in a spectacular manner.

I have since learned that the Gigabyte board did not boot because it needs a BIOS upgrade for this CPU, so I’m going to re-install it, install the BIOS upgrade and send this Assmoke back to Asrock for repair.

New Server Still Broke

     Replaced the Asus Prime X299 IIA board with an Gigabyte X299 Aorus Master, this board won’t even post.  Says bad memory, but I put in just two modules and no matter what two I tried it would not work, then I tried some Crucial memory I had on hand, it won’t post either, so there is something not right with the motherboard, and it was the last one they had in stock.  So I ordered an Asrock motherboard, maybe third time is a charm.  I ordered it two day delivery so should be here Saturday.  In the meantime, this server is going to remain out of service.

     This means Roundcube, Manjaro, Friendica, Hubzilla, Mastodon, and Yacy will be out of service until I get a functional motherboard.  To say this is frustrating is the understatement of the year.

Rocky8

     Since elm successfully compiled on Rocky8, I decided to try metamail.  It successfully compiled as well and is now installed and linked into elm.

     Also, this is a reminder that friendica.eskimo.com, hubzilla.eskimo.com, mastodon.eskimo.com, yacy.eskimo.com, and manjaro.eskimo.com will all be going down for probably around six hours tonight in order to replace the motherboard.

     The motherboard that is presently installed has two dead memory slots so a quarter of our system memory is not recognizable on that machine.