Eskimo North


          [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

          Mail & Web


          • To: outages-list@eskimo.com
          • Subject: Mail & Web
          • From: Robert Dinse <nanook@eskimo.com>
          • Date: Sat, 29 Aug 1998 22:01:37 -0700 (PDT)
          • Resent-Date: Sat, 29 Aug 1998 22:00:39 -0700
          • Resent-From: outages-list@eskimo.com
          • Resent-Message-ID: <"XYWck.0.gY.rnDwr"@mx1>
          • Resent-Sender: outages-list-request@eskimo.com

          
               I apologize for not posting this sooner but have been up to my eyeballs
          today to put it mildly.
          
               We had a user that wrote some procmail rules intended to bounce spam
          to the postmaster of the originating domain.
          
               About 99% of the time spammers forge the headers so the domains or
          reply addresses aren't valid.  So the bounced message was bounced back
          where his procmail rules again decided it was spam and bounced it.  The
          end result was a mail loop that consumed all of the mail spool space
          during the night.
          
               Like the last time a mail-loop ran the spool out of space; the end
          result was that some mailboxes were corrupted with about 540mb of NULLS
          being added to them.  This caused massive problems when someone tried to
          access their mail on both the mail server and on eskimo.  When someone
          tried to access the mail on the mail server; the pop server has to copy
          their mailbox (including the 540 megabytes of nulls) and this pretty much
          ties up that machine.  If someone accesses their mail from eskimo; then it
          tries to access that 540mb mailbox across the network; and despite both
          machines having FDDI interfaces; this quickly ties up both machines.
          
               A bug exists in SparcLinux NFS, where sometimes when an NFS request
          times out, NFS just stops talking and won't talk again until the nfs
          daemon is restarted or sometimes until the machine is rebooted.  This is
          what killed the main web server earlier today. 
          
               Nobody called about the web server for a number of hours and I didn't
          notice it because I was busy fixing the mail problems.  No mailbox files
          were lost this time though some are still being processed to remove the
          NULLS. 
          
               I had to take pop-3 down earlier in order to fix this which is why it
          was refusing connections for a while.
          
               Long term solution is still in the works that will put mail spool,
          user files, and ftp files on one file server so quotas can be enforced on
          mail spool, which will keep one users errant process from filling up the
          entire spool. 
          
               Jimmie and I are also working on an alarm scheme for various services
          here so that we will become aware of failures sooner. 
          
          
          
          
          

          • Prev by Date: Mail Problems
          • Next by Date: tia1
          • Prev by thread: sl-bb11-fw pos 5/2 (slot 5) reseat/replace (fwd)
          • Next by thread: Mail Problems
          • Index(es):
            • Date
            • Thread