I will be taking the client mail server down for approximately 1/2 hour in order to image the virtual machine and then to expand the image to accommodate a larger root partition. The machine ran out of space about 3:30AM last night owing to logs growing too large. This is to provide additional capacity for logs and temp files.
Category Archives: Uncategorized
Crash
Our server that houses home directories crashed today hard at 17:30 Pacific Time (5:30PM). It was running an experimental 4.14.0 kernel in order to try to determine at what point a bug that causes the E1000 Ethernet controller chip to hang.
It locked up so tight not even magic sys-req key worked. That was the mainline 4.14.0 final release. I had to power cycle the machine to get it to respond again.
After it did I booted 4.14.0-rc0, the first release candidate of 4.14, it immediately crashed and burned.
I told Canonical that I can not continue testing on production machines. I put 4.15.0-21 back in place and disabled hardware offloading on that interface. I’ve got a broken machine that I will try to repair and use it for testing.
Maintenance Completed
Maintenance is completed for the evening.
Now we know the E1000 drivers broke in the 4.14.0 kernel and in the mainline kernel so not a bad patch from Canonical at fault but some bad juju in the mainline kernel.
Really hate it when they fart with a device driver for a device that has been out for at least 13 years, and break it. Argh!
Mx2 is back in service
Mx2 is back in service on Ubuntu Artful 17.10. I was unable to determine what they upgrade to 18.04 broke but was able to document the error and generated a ticket on Launchpad so Canonical is working on it.
Mx2 isn’t needed from a capacity standpoint, it is just there to provide redundancy for Mx1 to receive mail in the event Mx1 is down.
Maintenance Boot Tonight
Tonight, I will be installing a test kernel onto the machine which hosts home directories and some virtual machines tonight and re-enabling hardware offloading.
This is in order to determine exactly what kernel broke the E1000 drivers so they can look between the last working kernel and the one that broke it and figure out which change caused the problem.
This means we will be back in a state where there may be pauses and NFS problems again but only temporary so they can determine which kernel, fix it, and be back to where we can have it work with the performance of hardware offloading.
Because almost everything depends upon access to home directories, this will break pretty much all services for a hopefully brief period.
Mx2 Maintenance
Mx2 is down for all practical purposes. I have the SMTP ports firewalled off so that I can upgrade this server to 18.04 LTS and compare to mx1 on 17.10 and figure out what is different that is breaking lists. 17.10 is only supported until July so I can’t just wait and hope that someone else fixes it.
Ethernet Hang
I was able to stop the Ethernet hang by disabling some some of the hardware offload functions of the Ethernet chip. So things are stable again with current kernels. This comes at the expense of some performance but not a lot as I was able to leave CRC32, which is more CPU consuming than just moving the data, enabled.
Maintenance Completed
Things went south in a big way. The new kernel they provided caused so many other I could not let it run long enough to see if it fixed the problem because it broke NFS, broke the console, broke the mouse. Getting the system back up on the old kernel was a challenge.
There was another experimental kernel in the developers repository. I am running on it now. It does not fix the problem but it provides much more detailed diagnostics that hopefully will enable that process.
It is 6:30AM now, so I will not be available until after around 3pm this afternoon. Please feel free to leave a message or mail and I will get back to you as soon as possible.
Maintenance Work Early Wednesday Morning
Commencing shortly after midnight and lasting anywhere from 15 minutes to several hours, depending upon how smooth things go, There is a problem with the 3.15.0 kernel drivers for E1000 Ethernet chips which my machines happen to use that causes them to periodically hang for about ten seconds. Most people will experience this as pause in I/O but some non-Linux versions of ssh will timeout and disconnect. I have been working with Canonical to come up with a fix. The first kernel they provided did improve the situation so that it happened a few times a day instead of a few times an hour. They have another kernel for me to try.
So tonight, I will be installing that kernel and rebooting to make it active. I am going to attempt this remotely but in the past sometimes network would not come up after a reboot and if that happens I will need to drive down to the co-lo facility and it will take longer. This will interrupt all services as it affects both the machines hosting the virtual machines and the NFS servers that hold your home directory, mail directory, and a few other common file systems.
I plan to start this work around 12:30AM and if things go well, I should be finished by 1AM, if not it may be as late as 3AM Pacific time.
Ubuntu Studio
I have installed Ubuntu Studio on ubuntu.eskimo.com. The section with Audio Production, Graphic Design, and Video Production are all part of Ubuntu Studio. Each is a separate menu of tools dedicated to those functions. For instance the tool I used to cut out and reduce a 1.2 MB menu image was gimp. It is included in the graphics design section although I had previously installed it stand-alone in Graphics.
Anyway, take a look and explore these new menus and see if they don’t contain some items you find useful. Because these are all graphical elements you will need x2go installed on your machine. You can get x2goclient from http://x2go.org.
The video production facilities will require a high speed connection to the Internet, I would recommend 20mbit/s download speeds or more. (Lower speeds will result in choppy playback but this will not effect the final video output file).