Page 1 of 1

Hardware Maintenance Sept 17-18th 2018

Posted: Mon Sep 17, 2018 4:48 pm
by Nanook
Hardware Maintenance Late Sept 17th to Early Sept 18th
Posted on September 17, 2018

I will be taking various machines down tonight for about fifteen minutes each to install new NIC cards with non-Intel chip sets. From 4.15.0 forward, the Linux kernel has had a bug in the Intel E-1000 drivers that cause the cards to lock-up when hardware offloading is used. Usually these lock-ups are transient resulting in 2-3 second delays in data but occasionally the cards will lock hard and require a drive to the co-lo facility to physically reset the machine.

Because the servers most affected are those carrying heavy traffic, the NFS server providing the home directories in particular, I will be replacing the NIC cards on all the NFS servers. This will affect virtually all of our services but will prevent long down times like we suffered Sunday morning from recurring.

I filed a bug report April of this year on this problem. Canonical has offered me various kernels to try, many of them either did not boot at all or were extremely unstable. At this point I feel it’s more cost effective and less service affecting just to replace the hardware.

Re: Hardware Maintenance Sept 17-18th 2018

Posted: Tue Sep 18, 2018 4:42 am
by Nanook
Hardware Work Done
Posted on September 18, 2018

I’ve replaced the Intel NIC’s with TP-Link NIC’s. The first machine took close to two hours because at first I was not able to get it to work. I finally chased it down to lack of patience on my part, these cards take approximately 1 minute to initialize.

I was not able to use the old cards in a private network as I had planned because as soon as I configured one, localhost became bound to it and broke the other connection. I’m sure this is operator malfunction but I will need to do further research.