Hardware Work Done

     I’ve replaced the Intel NIC’s with TP-Link NIC’s.  The first machine took close to two hours because at first I was not able to get it to work.  I finally chased it down to lack of patience on my part, these cards take approximately 1 minute to initialize.

     I was not able to use the old cards in a private network as I had planned because as soon as I configured one, localhost became bound to it and broke the other connection.  I’m sure this is operator malfunction but I will need to do further research.

 

Hardware Maintenance Late Sept 17th to Early Sept 18th

     I will be taking various machines down tonight for about fifteen minutes each to install new NIC cards with non-Intel chipsets.  From 4.15.0 forward, the Linux kernel has had a bug in the Intel E-1000 drivers that cause the cards to lock-up when hardware offloading is used.  Usually these lock-ups are transient resulting in 2-3 second delays in data but occasionally the cards will lock hard and require a drive to the co-lo facility to physically reset the machine.

     Because the servers most affected are those carrying heavy traffic, the NFS server providing the home directories in particular, I will be replacing the NIC cards on all the NFS servers.  This will affect virtually all of our services but will prevent long down times like we suffered Sunday morning from recurring.

     I filed a bug report April of this year on this problem.  Canonical has offered me various kernels to try, many of them either did not boot at all or were extremely unstable.  At this point I feel it’s more cost effective and less service affecting just to replace the hardware.

Outage Sept 16th 2018 05:15 – 12:46

     The ethernet controller on the server that provides /home pages wedged today.  Most services depend upon being able to access /home and were unavailable as a result.

     As near as I can tell looking at the logs, the ethernet wedged shortly after 5AM but I did not receive any telephone calls until around 11AM.  I was unable to fix it from here so I had to drive to the co-location facility.

     Everything was back in service at 12:46 (afternoon).

Investigating New Location for User Meetings

     I am investigating a new location for our user meetings.  This would be Amante Pizza on 123rd and Roosevelt Northeast in Seattle.

     The Pizza is excellent, they have spirits for those who care to imbibe, and they have a big screen TV that in theory can be connected to a computer and we can use for presentations.  Many people seem to want a more structured meeting but trying to do presentations on paper doesn’t work so well.  Being able to fire up a computer with a big screen live would be a huge plus.

     They do not know what inputs it has so I have to stop by and determine how to connect.

     I had been to Amante before when they were on 196th and just off of 44th in Lynnwood.  That restaurant had good food, great decor, but piss poor service.  This restaurant has excellent food, excellent service, but marginal decor.  However I think the room there is much more suited to our needs, it is completely walled off with glass walls from the rest of the restaurant so our noise won’t interfere with other diners and vice versa.

New Server – JuLinux.Yellow-Snow.Net

     We have a new server available for your use but this one is in the yellow-snow.net domain.  The full server name is julinux.yellow-snow.net.  If you use this server, e-mail you send will by default by username@yellow-snow.net.  E-mail to this address will also come to your INBOX.

     This server is a new Linux distribution called JULinux but it is only barely a distribution as it is essentially the Mate spin of Ubuntu configured to look like Windows with very nice artwork.  The software is all 100% Ubuntu so it has the stability, security, and is current like Ubuntu.

     The KDE implementation is broken in as much as the logout does not work so I would ask that you avoid using KDE with x2go on this server until I can figure out how to get it fixed.  All other x2go compatible window managers are working.

     If you do not have an existing Mate configuration and connect to this machine with x2go using mate, it will look like Windows.  If you do have an existing configuration then it will look the same as all the other Debian / Ubuntu derivatives.

     We do not yet have a corresponding yellow-snow.net web appearance but this is in the works.

Slow Service Early Wednesday Morning (1-4AM)

     Slow service early this morning and the temporary unavailability of mail.eskimo.com was the result of a denial of service attack where upon our name servers were used as amplifiers in a denial of service attack aimed at us.  I had to lower the external view rate limit because of this, hopefully it is still adequate to service legitimate requests.

     There are aspects of this attack that I do not understand.  They forged an address of 204.122.16.248 from outside (udp packets so no three-way connect) and directed requests at 204.122.16.8, so our name servers would attempt to reply to 204.122.16.248 but there was no host on that IP address and the result was that our router didn’t know what to do with it and it overloaded it logging what it considered “Martian” packets.

     The puzzling aspect of this is I have a firewall rule that SHOULD block all traffic from an external interface which has an internal address.  I was able to mitigate the attack by blackholeing 204.122.16.248 at the name servers and rate limiting responses.