Maintenance Done

     The maintenance is completed but just about anything that could go wrong did.

     I brought an old 3×4 style monitor because I had intended to move just Sparc equipment which has an 1152×900 resolution.  That is close enough to 1200×900 that the old ViewSonic 3×4 style monitors will sync to it.

     In the process I managed to snag the cord to both existing hosts / file servers and basically take everything down.

     The old monitor will not sync to the 1680×1050 resolution of the modern equipment except one modern servers will step down to 1200×900 resolution.

     After accidentally powering everything down that was my only window to bring things back up.  The file server with all the user information failed to start rpc.mountd, so none of the other machines could mount file systems from it.  After fixing that I had to reboot every other machine.

     I got the rack re-arranged so it can accommodate another couple of monster cases which in turn accommodates more drivers for RAID10 file systems and Hyper 212 Evo coolers to replace the Intel stock coolers.

     With the stock Intel coolers, the CPUs were overheating and throttling, with just an ordinary work load, resulting in slower performance.  Now they can run full tilt all four cores doing prime95 or the Linux equivalent, mprime, all day and never exceed 65°C.  They throttle at 74°C so that basically means they can run in turbo mode indefinitely now even with the most demanding work loads.

     I also have all the power and Ethernet cables routed nicely and strapped down now instead of running all over so less likely to accidentally unplug a server in the future.

     But I wasn’t able to get one server back in service yet because I didn’t have a monitor I could see to configure it so will have to make another trip down there tomorrow to finish it.  However, since this server currently has no services on it and all the cabling is now neatly strapped, this should not be service impacting.

Shellx Back In Service – Maintenance This Afternoon

   Shellx is back in service.  After restoring from a backup image, re-installing 170 updates, and rebooting, mail continued to function.

     We use postfix for the mail transfer agent.  For some reason instead of expanding what is in /etc/mailname, it was including it literally.

     Re-installing postfix entirely did not fix it so it was something postfix depended upon that was restored from backup but I do not know what.

     Late this afternoon I will be turning down as well as our radius servers to physically move them in the rack to make way for a larger case for another of our servers to address overheat issues.

     I will also be rebooting Igloo, our newest file server, to get rid of a stuck process.  This will momentarily affect all services.  This will probably happen around 4:30-5:00pm.


Shellx – Progress

     Restoring shellx from a backup image required fixing some things that have changed since, like the IP address of NIS servers and the file server.

     After that shellx worked, mail functioned properly again.

     However, I will need to re-apply updates.  I am making a new disk image now in case it was one of the updates that broke things. There are two likely candidates, a bad update, or corruption resulting from improper shutdowns that happened several times in the process of putting a new server in place.

     The plan is to make a backup of the server now that it is in a known good state, then bulk apply all the updates and re-test.  If everything still works I’ll know it wasn’t an update that caused the problem, if not I’ll restore from the backup now being made and then re-apply updates one at a time to find out which breaks, restore again and re-apply all except the troublesome update.


Shellx Corrupted

     Something in shellx has gotten corrupted breaking the local mail relay.  A complete re-install of postfix did not fix it.  The logs are giving me a rather non-helpful “unknown mail transport error” diagnostic.

     I am restoring the machine from a backup image.  This will take approximately 1/2 hour.  In the meantime all other shell servers are available for your use.

Eskimo North Yahoo Groups Page

     I will not respond nor post status to the Yahoo Groups Eskimo North page.  This is why:


Your message to the EskimoNorthUsers group was not approved.
The owner of the group controls the content posted to it and has the
right to approve or reject messages accordingly.

In this case, your message was automatically rejected because the
moderator didn’t approve it within 14 days. We do this to provide a
high quality of service for our users.

A complete copy of your message has been attached for your

Thank you for choosing Yahoo Group

     If you want real information come to our website here, go to our Facebook or Twitter pages: or

     If Eskimo North’s page is down whether it be for scheduled maintenance or an unplanned outage, information will be provided at the Facebook and Twitter pages.

     The Yahoo Group is not something I created, nor is it a place to seek remedies.  It’s just a group apparently created to bitch.


Out of Office 3pm – ? Monday / Maintenance

     I’ll be out of the office from 3pm until some time in the late afternoon or more probably evening.  I have to stop by my accountants office and then bring a server to the co-location facility.

     At the co-location facility I need to re-arrange the data cabinet a bit to accommodate another machine and this will involve moving a shelf that the old shell server the radius servers are on.  So between about 4:30 and possibly 5:30pm there will be a point where dial-up and DSL accounts can not authenticate and the old shell server will be unavailable.

     Existing dial-up and DSL sessions will remain up but new ones will not be able to be established during this time frame.  All sessions on will be torn down as the machine will need to be powered down and removed temporarily.

quota -v fixed

     The issue with quota -v command is fixed.  The problem is that the version of rpc.rquotad that ships with Ubuntu 15.04 is broken.  I removed it and installed the version from 14.10 and it worked, except that 14.10 isn’t systemd based and so the start-up scripts didn’t work.  But I saved the binary from that package, removed the package, re-installed the 15.04 package, and then substituted the rpc.rquotad program from the 14.10 package but using the start-up scripts from 15.04 and now it’s all working again.

     Sorry these upgrades have been so disruptive.  This was not my intent but it is a learning curve going from Centos 6 to Ubuntu 15.04.  It’s worth the effort I think in terms of performance and capabilities but a pain in the rear.

System Trouble October 10 2015 15:30

     Around 3:30pm, our new file server decided to unexport one of the necessary file systems (/misc) and refused to export it when the command exportfs -a was given.  A reboot was necessary to restore service.

     I had been running the latest 4.2.3 kernel rather than the Ubuntu 3.19.0-31 kernel, but opted at that point to revert back to the officially supported kernel.

     Still can’t get rpc.rquotad to function but at least now I know it’s not kernel related.

Debian Back

     Debian is re-installed as Jessie 8.2.  Things that didn’t work after upgrade like things executing out of /etc/rc.local now work.

     I could have restored it from backups but then we’d be stuck with the old version and the whole problem of upgrading again so might as well get it done.

     I am still reinstalling applications and tweaking various things but it is available.

     One thing that doesn’t work and will gradually be unavailable on all of our servers is NX protocol.  You will need to install and use X2Go instead.  The reason for this is that the freenx-server software has been removed from all of the repositories by the developers.

     This is a pain since X2Go does not support any version of Linux that is past the supported date.  Hence if you’re running an older version of Linux there isn’t anyway to use X2Go and now NX.

     Seems like the Microsoft mentality of forced updates has come to the Linux community in an unfriendly manner.  Well at least Linux updates actually fix things.

Reboot of Virtual

     I’m in the process of rebooting virtual.  This is one of the physical hosts that hosts a number of virtual servers, mostly shell servers. The servers affected will pause, but if you are patient they will return to life where they left off in a few minutes.

     This is necessary because something went ape shit during the re-installation of Debian.  All seemed to go well until the reboot to bring it online, then the Virtual Manager stopped responding.  In addition, I couldn’t even bring up a terminal although I could still login via ssh.

     At this point I don’t know what went wrong and I’m just trying to restore basic functionality.