I apologize for not posting this sooner but have been up to my eyeballs today to put it mildly. We had a user that wrote some procmail rules intended to bounce spam to the postmaster of the originating domain. About 99% of the time spammers forge the headers so the domains or reply addresses aren't valid. So the bounced message was bounced back where his procmail rules again decided it was spam and bounced it. The end result was a mail loop that consumed all of the mail spool space during the night. Like the last time a mail-loop ran the spool out of space; the end result was that some mailboxes were corrupted with about 540mb of NULLS being added to them. This caused massive problems when someone tried to access their mail on both the mail server and on eskimo. When someone tried to access the mail on the mail server; the pop server has to copy their mailbox (including the 540 megabytes of nulls) and this pretty much ties up that machine. If someone accesses their mail from eskimo; then it tries to access that 540mb mailbox across the network; and despite both machines having FDDI interfaces; this quickly ties up both machines. A bug exists in SparcLinux NFS, where sometimes when an NFS request times out, NFS just stops talking and won't talk again until the nfs daemon is restarted or sometimes until the machine is rebooted. This is what killed the main web server earlier today. Nobody called about the web server for a number of hours and I didn't notice it because I was busy fixing the mail problems. No mailbox files were lost this time though some are still being processed to remove the NULLS. I had to take pop-3 down earlier in order to fix this which is why it was refusing connections for a while. Long term solution is still in the works that will put mail spool, user files, and ftp files on one file server so quotas can be enforced on mail spool, which will keep one users errant process from filling up the entire spool. Jimmie and I are also working on an alarm scheme for various services here so that we will become aware of failures sooner.