Working on New Server

     Some time ago I started a new build for a new server based upon the i9-9900k.  I purchased the motherboard, but subsequent upgrades to Linux improved performance to where it was not necessary.  I was going to go with the i9-9900k because it was economically doable while the next generation really wasn’t.  I do not want to go to even newer generations because the e-core/p-core system of newer Intel processors has proven not entirely stable under Linux yet.

     Well, in the meantime, web traffic has doubled and the number of virtual private servers has grown to the point where if one machine failed, the remaining would not be able to run all the existing services with good performance so time to continue the build.

     I managed to score an i9-10900x CPU off of Amazon, new, for under $400 which is about half of what I had expected to pay for the i9-9900k.  This CPU is better in two ways, it has 48 PCI lanes so nvme SSDs can talk faster, and people have been able to get it to clock at 5Ghz without an issue.  This is a 10 core, 20 thread CPU, but the most useful thing about it for us is that it can support 256GB of RAM and it has four memory channels rather than two providing a whopping 90gb/s of memory bandwidth.  And best of all this CPU will work in the motherboard I had purchased for the i9-9900k.  And the ‘x’ part denotes that it is binned by Intel.  Basically, when CPUs are made, those nearest the center of the die provide the highest clock rates at the lowest voltage and heat dissipation.  Intel tests the chips and those with the highest performance get an ‘x’ part number instead of a ‘k’, otherwise the chips are identical.  This is significant in this application as cooling will be the main challenge.  For that I ordered a Noctua D15 cooler.  I am going to start with it’s 82CFM fans, and if those prove insufficient I can order some 300CFM fans that fit in the same form factor but are noisy but since this is going into a data center that already sounds like you’re standing behind a 747 at takeoff, that doesn’t really matter.

     The old Iglulik machine will then serve primarily as an additional machine for VPS’s with the new machine becoming the main web server and probably also will host ubuntu.

     If anyone knows any good relevant inuit / eskimo words, I am open to suggestions for new server name.  Shorter words are better.  Thanks.

Kernel Upgrade 3/3/23 11PM

     Planning on upgrading all the systems to Linux 6.1.14 Friday 3/3/2023 starting at 11PM.  If all goes well, should be completed by 11:30PM, if not it maybe another 1-1/2 hours (if a physical host does not reboot properly).

     This will effect all services including shell accounts, web hosting packages, e-mail, etc.  It also will affect our free fediverse services, https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, and https://yacy.eskimo.com/.

     All services are not expected to be down more than about ten minutes individually except yacy which takes about 30-40 minutes to rebuild it’s database after a reboot.

Outlook E-mail Outage

     Some of our customers have had e-mail to sites hosted by Outlook fail and get bounced with something on the order of cannot lookup sender address or some such.

     Microsoft claims troubles leaving there fine, i.e., they have no issue but Down Detector says otherwise and I’ve had reports from other sites experiencing the same, so just a heads up, this issue is affecting multiple sites to Outlook addresses or addresses hosted by outlook and the issue is not on our end.

Denial of Service

     Our web server is presently undergoing what is known as a “Slow and Low” denial of service attack.  In this type of attack someone initiates a large number of connections from sources which are very slow.  This limits our servers ability to finish a connection and so it eats up all available connections.  To counter this we’ve increased greatly the number of connections available but it still eats up a lot of memory forcing cached data out so the system must go to disk for most requests which slows things down.  Unfortunately our router decided to pick this time for a firmware upgrade and so traffic analyses is not available until the upgrade completes so we can’t readily identify and lock-out the source.

Flash Content

     If you wish to play the flash games in the Games section of our website, Ybbored.com, Defender-games.net, or any other flash site, now that Adobe has discontinued flash, you can do so with Firefox and an add-on plugin called “Flash Player 2022”.  This works pretty much with any OS that Firefox works with, including Linux.

Downtime Friday 11PM – 1AM

      It took me nearly two hours to get everything back up after booting into the new kernels tonight.

     The issue revolves around some recently added ufw rules to improve security.  Even though I have explicit rules permitting machines that need to see each others portmapper (rpcbind) to do so, they aren’t working and when ufw is enabled, none of the machines can see each others portmap, this breaks nfs and nis.

     So I’ll have to do some further investigation as to why this is the case, but that was the cause of the long downtime.

     This affected all eskimo.com services including our Fediverse services.

Web Based Terminal and Console Access

     The existing web based terminal and console access is broken because the Guacamole installed requires some features not present in OpenSSL 3.1, so I will have to compile and install a newer version.  I hope to get this corrected later this evening.

 

Tonight’s Outage

     Tonight’s outage was not the result of a hardware or software error, rather the result of an operator error.  I had built a new kernel and had intended to try it on my workstation before deployment but I also had a window open on the main file server because that is where I store and distribute kernels from and also where I have the configuration files.  I went to reboot my workstation but was in the wrong terminal and rebooted the server instead.  And because I hadn’t shut the virtual machines on it down properly, it did not come up cleanly, in particular the kernel NFS server was snarled and restarting it did not correct, so a second reboot was necessary.

     We will be performing a kernel upgrade to 6.1.9 this Friday, not because there are any obvious issues for 6.1.7, operator errors aside, it has been very stable, but because I made an error and misconfigured it.  I’ve corrected this on the web server which is most sensitive to this but really need to fix it on all machines.  And since 6.1.9 does have some minor fixes might as well get that in place.

    I am most looking forward to the release of 6.2, because it has some fixes that largely recover the performance lost to the various security work-arounds for the Intel Skylake chips and two of our physical servers are based upon this architecture.

Kernel Upgrade

     I made an error when I configured the last kernel.  While 6.1.7 does appear to be stable AND it appears to have fixed the long standing NFS bug for which I enabled the extra debugging, I accidentally compiled it with premption which I do not want on a server as it adds additional context switching overhead and decreases overall efficiency.  Thus I am going to be making a new kernel at least for the web server tonight (which is most affected) and will be doing a kernel upgrade just to fix this on the rest of the servers next Friday.  In the meantime, things may at times get a little slow.