On Wed, 20 Dec 2000, Eskimo North Support wrote: > > On Wed, 20 Dec 2000, Tim Freeman wrote: > > > Our email was down all morning today and I called and brought it to your > > attention. At around 11:20 AM, the email started working again. > > > > However, I have reports from people across the whole company of business > > emails lost this morning. I also have some important emails that I had > > sent myself from home this morning (5:00 AM) that I have never received. > > > > As far as I can tell, the email went down sometime between 2 & 5 AM and > > anything sent between that time and 11:20 has been lost. > > Checked up in the meantime after our phone conversation earlier on queueing > times, and the two MX servers (mx1, mx2) retry the queues every five > minutes (-q5m in the way it's started), and 'mail' (smtp, etc.) retries > every half hour via cron. > > I only see a couple 'fjmartin' messages in the queue at the moment on one > of the servers, to "WilG@fjmartin.com" at 16:54:24 and 17:06:45, that are > trying to send off to its forwarding address, "firstname.lastname@example.org". > > I'll bring this up with Robert as soon as I can, though, as I do see he had > rebooted one of the servers (mx1) at about 11am today. I just don't know > the specifics currently. > > ~ Eric Non-technical explaination: The mail outage was the result of an operator error on my part. The mail isn't lost but was written to the wrong disk. I have written a script to append these messages onto the proper spool files so that the messages will be recovered and appear in peoples INBOX's sometime this early morning. Technical explaination: Recently, we moved the mail spool from mx1 which has just an IDE drive and is the ONE PC that we have, to an UltraSparc machine with a stripped partition across two 10,000 RPM SCSI drives to address a disk I/O bottleneck that was slowing mail. On mx1, I manually umounted the physical disk partition and mounted the NFS partition from the new server. I forgot to edit the /etc/fstab file so that this mount would happen when the machine was next booted. This morning, I upgraded the kernel on mx1 to resolve an NFS compatibility issue. This involved a reboot to load the new kernel. I did not realize at the time that it had mounted the old IDE disk partition instead of NFS mounting the spool directory off the new server. As a consequence, mail that came in on that server during that time frame went to the old disk instead of to the NFS mounted partition. Mail that came in via mx2 was delivered normally but it is an overflow server, mail goes to mx1 first unless it is too busy. The disk it went to had old spool files, if I just cat'd them onto the end of users spool files, they'd have a lot of old messages returned to their INBOX that they had previously received, obviously an undesirable situation. So I've written a script which edits out all the old messages leaving just those that arrived today in the file (on the wrong drive). This is presently running. After this concludes, I will cat the messages onto the end of the spool files for each user and those messages will be available to the user.