[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help!
- To: Eskimo North Support <support@eskimo.com>
- Subject: Re: Help!
- From: Robert Dinse <nanook@eskimo.com>
- Date: Thu, 21 Dec 2000 01:37:26 -0800 (PST)
- cc: outages-list@eskimo.com
- In-Reply-To: <Pine.SUN.3.96.1001220174500.2952D-100000@eskimo.com>
- Resent-Date: Thu, 21 Dec 2000 01:38:05 -0800
- Resent-From: outages-list@eskimo.com
- Resent-Message-ID: <"2lepa2.0.Ku3.yzSGw"@mx1>
- Resent-Sender: outages-list-request@eskimo.com
On Wed, 20 Dec 2000, Eskimo North Support wrote:
>
> On Wed, 20 Dec 2000, Tim Freeman wrote:
>
> > Our email was down all morning today and I called and brought it to your
> > attention. At around 11:20 AM, the email started working again.
> >
> > However, I have reports from people across the whole company of business
> > emails lost this morning. I also have some important emails that I had
> > sent myself from home this morning (5:00 AM) that I have never received.
> >
> > As far as I can tell, the email went down sometime between 2 & 5 AM and
> > anything sent between that time and 11:20 has been lost.
>
> Checked up in the meantime after our phone conversation earlier on queueing
> times, and the two MX servers (mx1, mx2) retry the queues every five
> minutes (-q5m in the way it's started), and 'mail' (smtp, etc.) retries
> every half hour via cron.
>
> I only see a couple 'fjmartin' messages in the queue at the moment on one
> of the servers, to "WilG@fjmartin.com" at 16:54:24 and 17:06:45, that are
> trying to send off to its forwarding address, "ericmontenegro@uswest.net".
>
> I'll bring this up with Robert as soon as I can, though, as I do see he had
> rebooted one of the servers (mx1) at about 11am today. I just don't know
> the specifics currently.
>
> ~ Eric
Non-technical explaination:
The mail outage was the result of an operator error on my part. The mail
isn't lost but was written to the wrong disk. I have written a script to
append these messages onto the proper spool files so that the messages will be
recovered and appear in peoples INBOX's sometime this early morning.
Technical explaination:
Recently, we moved the mail spool from mx1 which has just an IDE drive and
is the ONE PC that we have, to an UltraSparc machine with a stripped partition
across two 10,000 RPM SCSI drives to address a disk I/O bottleneck that was
slowing mail.
On mx1, I manually umounted the physical disk partition and mounted the
NFS partition from the new server.
I forgot to edit the /etc/fstab file so that this mount would happen when
the machine was next booted.
This morning, I upgraded the kernel on mx1 to resolve an NFS compatibility
issue. This involved a reboot to load the new kernel. I did not realize at
the time that it had mounted the old IDE disk partition instead of NFS mounting
the spool directory off the new server.
As a consequence, mail that came in on that server during that time frame
went to the old disk instead of to the NFS mounted partition. Mail that came
in via mx2 was delivered normally but it is an overflow server, mail goes to
mx1 first unless it is too busy.
The disk it went to had old spool files, if I just cat'd them onto the end
of users spool files, they'd have a lot of old messages returned to their INBOX
that they had previously received, obviously an undesirable situation.
So I've written a script which edits out all the old messages leaving just
those that arrived today in the file (on the wrong drive). This is presently
running. After this concludes, I will cat the messages onto the end of the
spool files for each user and those messages will be available to the user.