Eskimo North


          [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

          Re: Help!


          • To: Eskimo North Support <support@eskimo.com>
          • Subject: Re: Help!
          • From: Robert Dinse <nanook@eskimo.com>
          • Date: Thu, 21 Dec 2000 01:37:26 -0800 (PST)
          • cc: outages-list@eskimo.com
          • In-Reply-To: <Pine.SUN.3.96.1001220174500.2952D-100000@eskimo.com>
          • Resent-Date: Thu, 21 Dec 2000 01:38:05 -0800
          • Resent-From: outages-list@eskimo.com
          • Resent-Message-ID: <"2lepa2.0.Ku3.yzSGw"@mx1>
          • Resent-Sender: outages-list-request@eskimo.com

          On Wed, 20 Dec 2000, Eskimo North Support wrote:
          > 
          > On Wed, 20 Dec 2000, Tim Freeman wrote:
          > 
          > > Our email was down all morning today and I called and brought it to your
          > > attention.  At around 11:20 AM, the email started working again.
          > > 
          > > However, I have reports from people across the whole company of business
          > > emails lost this morning.   I also have some important emails that I had
          > > sent myself from home this morning (5:00 AM) that I have never received.
          > > 
          > > As far as I can tell,  the email went down sometime between 2 & 5 AM and
          > > anything sent between that time and 11:20 has been lost.
          > 
          > Checked up in the meantime after our phone conversation earlier on queueing
          > times, and the two MX servers (mx1, mx2) retry the queues every five
          > minutes (-q5m in the way it's started), and 'mail' (smtp, etc.) retries
          > every half hour via cron.
          > 
          > I only see a couple 'fjmartin' messages in the queue at the moment on one
          > of the servers, to "WilG@fjmartin.com" at 16:54:24 and 17:06:45, that are
          > trying to send off to its forwarding address, "ericmontenegro@uswest.net". 
          > 
          > I'll bring this up with Robert as soon as I can, though, as I do see he had
          > rebooted one of the servers (mx1) at about 11am today.  I just don't know
          > the specifics currently.
          > 
          > ~ Eric
          
               Non-technical explaination:
          
               The mail outage was the result of an operator error on my part.  The mail
          isn't lost but was written to the wrong disk.  I have written a script to
          append these messages onto the proper spool files so that the messages will be
          recovered and appear in peoples INBOX's sometime this early morning. 
          
          
               Technical explaination:
          
               Recently, we moved the mail spool from mx1 which has just an IDE drive and
          is the ONE PC that we have, to an UltraSparc machine with a stripped partition
          across two 10,000 RPM SCSI drives to address a disk I/O bottleneck that was
          slowing mail. 
          
               On mx1, I manually umounted the physical disk partition and mounted the
          NFS partition from the new server.
          
               I forgot to edit the /etc/fstab file so that this mount would happen when
          the machine was next booted.
          
               This morning, I upgraded the kernel on mx1 to resolve an NFS compatibility
          issue.  This involved a reboot to load the new kernel.  I did not realize at
          the time that it had mounted the old IDE disk partition instead of NFS mounting
          the spool directory off the new server.
          
               As a consequence, mail that came in on that server during that time frame
          went to the old disk instead of to the NFS mounted partition.  Mail that came
          in via mx2 was delivered normally but it is an overflow server, mail goes to
          mx1 first unless it is too busy. 
          
               The disk it went to had old spool files, if I just cat'd them onto the end
          of users spool files, they'd have a lot of old messages returned to their INBOX
          that they had previously received, obviously an undesirable situation. 
          
               So I've written a script which edits out all the old messages leaving just
          those that arrived today in the file (on the wrong drive).  This is presently
          running.  After this concludes, I will cat the messages onto the end of the
          spool files for each user and those messages will be available to the user. 
          
          
          

          • Prev by Date: WWW with apache error; rebooted
          • Next by Date: www up again
          • Prev by thread: www up again
          • Next by thread: WWW with apache error; rebooted
          • Index(es):
            • Date
            • Thread