Stability or the Lack Thereof

This morning I finally figured out the source of the most recent instability (since we changed out the bad NIC card).

We kept having this incidence where I’d go to the co-lo, thought I had
everything working but in minutes or hours or sometimes a few days it would
just stop talking to the Internet.

One of those occurred this morning, I went down, looked at settings, nothing appeared to have changed, but it wasn’t routing. Rebooted the server,
started routing again. Went back home, couldn’t ping anything.

And I’m really half a sleep and my workstation is busted on account of the
fact that the night before I tried to upgrade the OS, it failed, and so I tried
to restore from backup but afterwards I could not boot. And at this point I
have had approximately two hours of sleep in the past 48 hours.

So I drove back down to the co-lo center again, and keep in mind it’s
22 miles each way and pretty close to rush hour so not at all a pleasant
drive. This time I rebooted again but still no route, several times, still
the same.

So at this point I got the NOC involved, neither of us could ping the
other end of the wire and that should not have been difficult since it’s just
a single Ethernet to Ethernet cable. But we couldn’t, so I got the idea of
unplugging the LAN interface and after doing that the WAN interface immediately
came up, so this pointed to something wrong on my end, didn’t know what but
had to be something on my end.

So I took a look at the routing table and it quickly became apparent what
was wrong, there were not one but TWO default routes. This is not legal under
Linux so how did it come about?

Well on the WAN interface, I had the correct gateway address, but on the
LAN side I was pointed at my own machine instead of his router. Why I did
his is because if I had a rounder in between my machines and LAN and his router
my router’s IP would be the gateway IP we point all the local machines to.

So corrected this to the correct gateway IP, everything came up from a
routing perspective and has remained that ever since.

But once I got home, I got another telephone call, not able to receive
e-mail from g-mail. At first I had the problem of not having a workstation
to use, totally forgot about my laptop, probably sleep deprivation.

But I did remember I had an antique Dell loaded with Linux, fired it up
and SSH’d into the mail servers, found both were operational but both had a
bunch of jobs stopped but no logged errors indicating why. I rebooted them
and they came up and ran fine except that the load went up to around 200 for about half an hour then settled down to a normal below 1 load.

At this point I’m speculating that without a network queue runs got stuck
until they exhausted memory then things died.

At this point I returned to my workstation. I had restored a corrupted
root partition from backups but after doing so it would not boot. I finally
chased this down to the fact that I did a reformat prior to reloading from
backups and this changed the partition UUID so that it no longer matched the
fstab file. Fixed that up and now it’s running properly again.

I expect to receive the new router sometime between this Friday and next
Monday, I don’t know how long it will take for me to learn how to use it will
enough to put it into service but it is very thoroughly documented, has eight
manuals, the administrative manual is 3061 pages. Also some free online courses and if you want to get into it deeply you can spend as much as $6000 on
non-free training. It supports damned near every communication protocol known
to man.

Outage

Sorry for the down time.  Our second router to die in two weeks died yesterday afternoon.  At first I thought it was something I did but then after restoring from a backup when it was working it still didn’t.  The vendor had changed the software in such a way that the translation of firewall rules from the user interface to iptables internally was no longer happening correctly resulting in a broken firewall that can’t be disabled.

Since I had no further spares to drop back to, I configured out of the Linux servers to act as a router until we can get one.

While we were not able to add a Diaspora at this time, We DID Add a Mastodon Instance

     You can view our latest social media instance either by using Web Apps on our main web site https://www.eskimo.com/ and selecting Mastodon, or you can go there directly https://mastodon.eskimo.com/.

     Mastodon is a Twitter/X link interface but unlike Twitter, Mastodon is part of the Fediverse, like Friendica and Hubzilla, which consists of tens of thousands of sites with no one owner of the entire network.  Each site is responsible for moderating it’s own servers If it chooses to moderate at all.

Diaspora

Someone asked for us to setup a Diaspora node here, unfortunately because of openssh and ruby versions, we can not do this at present.  In theory this will be possible after the next Diaspora release which will be ported to ruby3 and openssh 3.x.

Ubuntu 22.04 ships with OpenSSL version 3, with no option of using OpenSSL 1 instead. The currently available diaspora* releases depend on Ruby 2.7, and are not yet compatible with Ruby 3. Unfortunately, Ruby 2.7 has no official support for OpenSSL 3, so setting up diaspora* on Ubuntu Jammy requires a lot of extra workarounds that the project team currently cannot offer help or support for.

We expect the next major release to support Ruby 3, and thus by extension Ubuntu Jammy. For now, please set up your pod on Ubuntu 20.04, and upgrade the distribution when diaspora* is ready.

Interruption Tommorrow

Sometime tomorrow there will be a brief interruption of Eskimo North’s services including https://friendica.eskimo.com/, https://hubzilla.eskimo.com/, https://nextcloud.eskimo.com/, shell servers, e-mail, and web hosting, lasting from one to ten minutes while our Internet backbone connection is moved from a 100mb/s port to a 1Gb/s port.
I do not know what time this will occur and I do not know at what time an Isomedia tech will arrive at the facility to switch it over.

Router Speed+

     Also forgot to mention we’ll also be supporting Jumbo Frames (4500 byte frames) if you have a 1Gbps or better connection and your provider supports them.  This will help with performance for things like transfer of large files or streaming.

Mail / DNS Issues

We are having DNS issues all over the place, and even my home machine using non-eskimo serves is getting DNS errors, and no updates queued for Ubuntu seem to be relevant. It is these DNS issues that are causing problems with e-mail.  I am working on resolving, I have one incoming mail server working by using external servers that do seem to be “kind of”, as in they get an error with nslookup but still return the data.  So a big mystery, if anyone knows anything that might have happened with DNS system wide, please let me know.

Unplanned Disruption

I apologize for the disruption today.  Our gateway router crapped out entirely as in not crashed but dead as a door nail.  I had already purchased a new unit with a lot more CPU and memory to replace it but owing to my unfamiliarity it took some time to get it configured and operational.  But this unit is hands down way more powerful than the old so we shouldn’t see the lag during heavy traffic or be nearly as easily packet flooded as with the old unit.

Kernel Upgrade 11PM Sept 1st

     Going to reboot all or most machines at 11pm to upgrade kernels.

     This will affect all services, paid and unpaid.  With the exception of yacy which will take about 45 minutes, other services should not be down longer ten minutes.