Mail Reception Issues

First some background, Sender Policy Framework or SPF is a framework that allows a domain to specify which mail servers mail for it may legitimately originate from. These are encoded in a DNS TXT record.

Systemd is a super daemon that replaces init xinetd, and also part of the dns service.  It does so in a manner that is, like most Poettering projects, it kind of works some of the time.  It fails with some TXT records breaking SPF.

Here is an example of retrieving a TXT record using nslookup with systemd-resolver:

set type=txt
> wholefoods.com
;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
;; Connection to 127.0.0.53#53(127.0.0.53) for wholefoods.com failed: connection refused.

To fix this problem, I have disabled the systemd resolver and gone back to using bind, the standard DNS resolver as a caching server.  Now the results:

> set type=txt
> wholefoods.com
;; Truncated, retrying in TCP mode.
Server: 127.0.0.1
Address: 127.0.0.1#53

Non-authoritative answer:
wholefoods.com text = "globalsign-domain-verification=MT3LmRzGYPgORWLlSBkPpAUpBDH9kl8xxYmB6FjtjY"
wholefoods.com text = "MS=ms90241053"
wholefoods.com text = "v=spf1 mx ip4:67.199.115.110 ip4:64.132.0.4 ip4:67.199.120.97 ip4:63.241.240.25 include:amazonses.com include:spf.protection.outlook.com include:_spf.q4press.com -all"
wholefoods.com text = "GxIV1cqmXdB1Jl1Qd1LgJyBAd8k4QEnQL4LZpSZS+yu/noX6ra5XpJepHvcohGGfvfnrn9N3bukOSw71brafNA=="
wholefoods.com text = "globalsign-domain-verification=pyR6ci6IB7uVAxLPZN5Z7_imdnvGJLhXCcmfs8v5RP"
wholefoods.com text = "adobe-idp-site-verification=ffdbe896-53c0-4f83-ad01-0ec20ef0833d"

     This should correct the problem of mail being rejected with an SPF failure even though it arrived from a legitimate SPF specified server.

Web Server Slowness

     Today I received a complaint about slow response from our web server, not something we frequently have an issue with.  It took me a while to find what was going on because the hardware was not that busy.

     First, we had been hit with a couple of SYN floods today but they were brief and we have SYN cookies enabled so that should not have caused a problem.

     I could not find any backlog in the network, ping times from our web server to Googles name server was under 2ms, so not a problem with network.

     I looked at the server statistics and it was doing around 20 hits/second an the worker threads were maxed out.  I increased the worker threads in Apache from 1000 to 5000, and the traffic jumped to 46 hits/second with no more lag.

     So the issue turned out to be a software bottleneck in Apache’s configuration.  We haven’t had that much traffic since Milla Jovavich’s website was here many years ago so it had not been tested at that heavy of a load.

     Still there were plenty of hardware resources.  With that traffic and those settings it used about 10GB on a 128GB RAM machine and did not even saturate one core of a six core processor.  Assuming traffic handling scales reasonably linearly with worker threads, we should be good for about 100 hits/second now but there is still plenty of hardware overhead to increase further if need be.

Linux Does What Windows Don’t

    Mint wouldn’t be my first choice, I’ve had nothing but trouble with it not properly mounting NFS shares and systemd scripts not working properly. I’ve had much better results with Ubuntu.

     He mentions Fedora as being a free version of Redhat, two other options are available and those are CentOS and Scientific Linux. These tend to be less current than Fedora but more stable.

     Also the author makes the statement that the Linux updater updates everything on your system.  The Linux updater only updates programs installed with the Linux install tools, such as apt on debian based systems, yum on Redhat systems, zypper on Suse, or dnf on Fedora.  Programs which you install from source or download from a website are not updated by the system updater.  Also modern Linux can require reboots for some non-kernel updates, particularly libc updates.

Centos7 Down

     In the process of fixing boot problems with centos7, I broke bind, the name server.  It fails to start, but does not log any errors to give me a clue as to why.  Thus I am reverting to the older centos7 image that was a pain to boot but had a working name server.  Because the backup is compressed, it will take several hours to restore.

System Wide Maintenance

     I will be taking most everything down, not all at the same time, tonight, to do some system tuning to improve performance.  Outages of any one service should not last more than about 1/2 hour.  This should start round 2AM and conclude by 6AM Pacific time.

Centos7 Maintenance

     Centos7 will be down for approximately 1/2 hour to image.  I made some changes to fix issues with slow and unreliable boots.  I want to get this backed up in this state so that if recovery becomes necessary I don’t have to re-apply all the fixes.

SPF Policy Failures Affecting Incoming Mail

     I had a customer complain about being unable to receive e-mail from a particular site.  Upon examining the logs, I saw errors from our spf policy checker.  This is a perl script that looks up a domain and checks the spf records to be sure that the server attempting to deliver e-mail for that domain is a legitimate server for that domain.  It was failing even though the records looked legitimate and the server was in the allowed list.

     I Googled the error and found a note from the author saying it was caused by a change in systemd resolver but that he had fixed it in the spf script and the new fixed version was available from cpan.  I installed the version on span and so far the only spf rejections I’ve seen are legitimate so I believe this problem is fixed.

Web Server Maintenance Complete

     The web server maintenance has been completed.  I apologize for taking longer than anticipated.  The virtual disk was close to full and I had to resize it to make more space available for web applications and database data.

     I ran into some problems doing this.  Resizing the image went fine but the version of gparted shipped with Ubuntu 17.10 is broken and refused to resize a partition on a virtual disk.  I got around this by downloading from gparted’s site instead of using Ubuntu’s gparted but it took me a while to come to that fix.

     The partition is now resized to take advantage of the larger image and all is well.