Moving Domain, Name Server Changes, Hardware Upgrade

I am moving eskimo.com domain to a new registry. I have some issues doing this in that my name servers are in the same domain presently. This is problematic as it creates a circular resolution problem where to resolve a name server address here, say ns1.eskimo.com, you first need to access the name server to get the IP address of ns1.eskimo.com but since you don’t yet have the IP address you can’t get it.

This is presently resolved by the use of glue records. These are records maintained in the upstream name servers that resolve our name servers to physical IP addresses to get around this circular resolution problem. And in the beginning there was only one registry, Network Solutions, so this was okay but today there are multiple and Network Solutions will eliminate the glue records the instant they release the domain but the new registry might take several days to get it in their database during which time we have no glue records.

This is no good, so what I have done is placed a DNS server in another domain which I own, yellow-snow.net, and it will be one of eskimo.com’s name servers, however web.com’s website for changing the DNS records is not working so I have to wait for their team to change it for me. So we are holding off moving the domain, this may happen this weekend or next depending upon their timing. Anyway wanted to make you aware of that potential interruption.

In addition, I am going to setup a name server on my machine off the cable modem so that in the event of a network interruption, eskimo.com’s mail servers will still resolve.  This will cause sending sites to queue and resend the mail rather than bouncing it so that in the event of a future network interruption, e-mail will not be lost or bounced, only delayed for the period of the outage.

Another potential interruption is that I am going to be replacing the CPU on the new web server, currently an I9-10900x which is 10 cores, to a I9-10980xe which is 18 cores, upgrading the CPU cooler fan from one that maxes at 1500 RPM to one that maxes at 3000 RPM to help dissipate the huge amount of heat these CPUs generate, and I’m replacing the RAM which is currently 8x32GB sticks with 4x64GB sticks, and the reason for this is that this CPU has difficulty driving two sticks per channel, just not enough drive current requiring that I slow memory down, but to feed 18 cores I want this memory I/O to be as fast as possible.  Right now only social media is on this machine but eventually I will be moving all web services to it.

So probably next weekend there will be a period where this site is down for that hardware upgrade, but the end result should be faster performance.

New Web Applications Server

     The new server continues to be unstable, it tends to spontaneously reboot when the upper reaches of memory get used.  It may be a bad DIMM.  I also accidentally over-volted the CPU at one point, I meant to set 1.17 volts and accidentally set 1.7, the max for this CPU is supposed to be 1.5v so may have broken down a transistor or two.

     When I bought the RAM for this machine I bought 8x32GB DIMMS as 64GB DIMMS were unavailable at the time.  The Intel engineer did warn me that the CPU had difficulty driving two DIMMS per channel and I probably would not be able to run memory at the memory controllers rated speed, none the less until recently it was running OK.

     Since 64GB DIMMS are now available, I’ve ordered 4x64GB to replace the existing 8x32GB, that way the CPU only has to drive one DIMM per channel and can operate at maximum speed.  I also got a steal of a deal on an i9-10980xe, this is basically an 18 core version of the i9-10900x.  So when this all arrives I will be replacing all the RAM and the CPU.  Also, I purchased a 3000 RPM max CPU fan to replace the 1500RPM fan on the Noctua 15 cooler, with 18 cores I will need all the cooling I can get.  Though this CPU really does not get real hot until you clock the cores at 4.3Ghz or above, below that you can keep the voltage down to where it stays reasonably cool even under load.

Manjaro Is Available Again

     Manjaro finally released an ISO that correctly builds x2goserver, nis, and other necessary tools.  It is installed and operational.  If there are software packages you would like installed let me know and I’ll at least attempt.  ELM will not build on it just like most other modern Linux platforms so I can not install that for you.  Pine is there as Alpine, bash, csh, tcsh, ksh, and zsh shells are installed.  If you would like others let me know.

Older Web Server Outage

     Last night the old web server was rebooted by me to change an IP address that for some reason the networking would not let go of even with a network restart.  I neglected to check to make sure everything came back up, my bad.  Somehow ownership of the encryption key for mariadb got changed so that mariadb could not read it’s key.  This caused it to fail to start.

Mail

Owing to the ypbind unbound issues causing mail to be returned as no such address, I’ve created the following script which is run on all the mail servers out of crontab once a minute.  The purpose of this script is to keep track of the status of ypbind, if unbound, shutdown postfix so sending mail will only get a temporary error and queue and resend.  Then once a minute try to restart ypbind until it succeeds at which point restart postfix.  This should prevent long outages if an update disables ypbind but forgets to re-enable when completed.

#!/bin/bash
if test -f /opt/status/ypmon.dat
then
YPSTAT_PRIOR=`cat /opt/status/ypmon.dat`
else
YPSTAT_PRIOR=”unknown”;
fi
if ypwhich > /dev/null
then
if [[ “$YPSTAT_PRIOR” == “bound” ]]
then
exit 0;
else
echo ‘Change from ypbind unbound to bound – starting postfix’
systemctl start postfix;
echo “bound” > /opt/status/ypmon.dat
exit 0;
fi
else
systemctl restart ypbind
if [[ “$YPSTAT_PRIOR” == “unbound” ]]
then
exit 0;
else
echo ‘unbound’ > /opt/status/ypmon.dat
echo ‘Change from ypbind to unbound.’
echo ‘Stopping postfix, restarting ypbind.’
systemctl stop postfix;
systemctl restart ypbind;
fi

Sorry for the formatting, the “code” option isn’t working on my copy of WordPress.

FTP Server

      The ftp server is broken. This is caused by libraries being replaced by Ubuntu upgrades with libraries no longer compatible with the libraries it was compiled against. Further, the existing source for wu-ftpd will no longer compile in the modern compile environment so I can not fix it as I have done in the past. The last update to the source was in 2006 so this is not likely to be fixed and I will need to move to a newer ftpd, recommendations are welcomed. In the meantime please use scp or other file transfer protocols.

Drive Replacement Successful

Drive replacement went extremely smoothly, total down time of 23 minutes.  Drive is now replicating the other drive in the raid array.  Indications are that it will take another seven hours (it’s been going for 45 minutes) so my projection of 6-8 hours seems to be spot on.  System may be a bit slow during this interval since effectively it’s continuously flushing out the buffer with new data.

Maintenance April 7th 2:00AM ~2:30AM

We will be off line, the entire network, for up to about half an hour, starting around 2AM Sunday morning, to replace a failing drive in the machine which is also acting as a router at present.  This drive is part of a RAID array so all data is duplicated and none will be lost.  If things go smoothly, it could be as short as 15 minutes, if not then maybe half hour or slightly longer.

The big unknown is that sometimes when software RAID comes up in degraded mode, which it will do initially until the new drive is pumped up, sometimes systemd will hang necessitating going through emergency mode and bringing things up by hand.  In my experience this is about 30% of the time.  It will take usually about 6-8 hours for the system to sync a new 4TB drive but the system can operate while this is in progress it just sometimes Poetteringware adds some challenges.

Mail Back to Normal

Today after some three hundred updates, the original SPF checker I was using, the phython3 version, still was not working, so I installed the perl versions of policyd.  I don’t really like perl as I don’t find it very readable relative to python, but presently it is working.

I also found the clamav virus check was dead, re-installed that.  Now all the mail milters, the clamav- virus check, spf, dkim, and dmarc are once again functional so this should reduce the flood of “we’re going to make your life miserable if you don’t send 50,000 bit coins to X” messages.

Also, the perl SPF policyd is actually somewhat better in that it checks both the ehlo host and the mail-from: host to make sure both are allowed from the sites SPF record, while the old checker only checked the mail-from, so this will be somewhat more thorough requiring consistency that the others did not.

I sent myself mail from gmail to make sure incoming was working and also watched the logs a while.