Inuvik Too Hot

    Inuvik is running too hot.  This machine was running at 4.8Ghz small fft torture test 36 threads 2/threads per core before I brought it over to the co-lo but it is exceeding 96C now but only on a couple of cores.

     When you have a couple of cores running hot on a multi-core CPU but the rest are normal, this is usually indicative of an air bubble between the CPU and cooler so part of the heat spreader is not receiving cooling.  This is more pronounced with the i9-109×0 series of CPUs because the heat spreader is soldered to the die.  On most microprocessors there is thermal compound between the die and the heat spreader. This creates some diffusion that does not occur when the die is soldered to the heat spreader so any air bubbles are more critical.

     I’ve ordered some more Kryonaut Extreme which should get here between October 1st and 3rd, at which time we will pull the machine from the co-lo for a few hours to clean the CPU and heat sink and re-paste it.  I will perhaps be just a smiggin’ more generous with the paste this time.  I am stingy not because of cost but because no matter how conductive thermal paste is it is less conductive than the metals you are trying to transfer heat between so you want as thin of a layer as you can get away with, but the worst thermal paste is better than the best air so a little too much is less bad than not quite enough which appears to be the case presently.

     Between now and then I’ve reduced the speed of the machine from 4.8ghz to 4.4ghz and CPU voltage from 1.37 to 1.2v to reduce heat generation.  This will reduce performance by slightly less than 10%, but give it’s around 97% idle time on the CPU’s this should not be a problem and it’s only temporary.

     Right now this is more of an issue than it otherwise would be because there exists a bug in the kernel code when it writes to the MSR to change the CPU speed in response to excess temperature.  If this bug did not exist the machine would simply have automatically downclocked, but this is a current bug affecting these particular CPUs.

Imapd / Pop3

     Further research suggests that this isn’t going to fix it.  I’m going to update Dovecot anyway just to get it current but will probably do this later this evening instead of at 5pm.

     It appears that this problem is because of the POODLE exploit that came out which RedHat “solved” by disabling SSLv3.

     The only fix at this end would be compiling OpenSSL from source, and then recompiling a whole bunch of stuff not to use the system version because RedHat isn’t going to fix it properly, or build a new server based on a non-broken operating system, and that is problematic because Red Hat’s EL6, upon which CentOS 6 is based, has a broken implementation of NFS version 4, which is really needed for mail to work properly owing to the lack of mandatory locking on earlier versions of NFS.

     In the long term I am going to work towards moving our infrastructure away from Red Hat and towards Ubuntu.  Although the Ubuntu people occasionally screw things up, they almost always fix them quickly.  Red Hat is becoming impossible to maintain and have properly interact with other operating systems, kind of like Microsoft twenty years ago.

     Since there is no good short term fix on this end, those affected will either need to upgrade their software to something capable of TLS or use a mailer that doesn’t override their encryption selections, such as Thunderbird.