[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Host Problems
- To: outages-list@eskimo.com
- Subject: Host Problems
- From: Robert Dinse <nanook@eskimo.com>
- Date: Tue, 16 Nov 1999 14:59:47 -0800 (PST)
- Resent-Date: Tue, 16 Nov 1999 14:59:51 -0800
- Resent-From: outages-list@eskimo.com
- Resent-Message-ID: <"ECvB-3.0.4X1.c7UCu"@mx1>
- Resent-Sender: outages-list-request@eskimo.com
Problems this afternoon starting at about noon; were all related to human
error.
Aaron was ill today; Catherine came in to cover for him. I managed to get
about an hour sleep between 9am and 10am this morning so I really was not and
am not in a state to cover.
She saw "tombstones" on the perfmeter for eskimo, this is an indication
that rcp.statd isn't responding, may be a dead machine, but may also be a load
spike.
She rebooted eskimo; eskimo failed to come up because it couldn't mount
file systems from mx1. She rebooted again, this time without a graceful halt.
It resulted in file system corruption which had to be manually fixed.
After that was fixed; eskimo would still not mount files from mx1 which
has the mail spool; and it happens that that is the first nfs mounted file
system it tries to mount so it basically stopped all the others.
Web servers and other machines are dependent upon eskimo for web files;
mail is dependent upon mx1 to access the spool. So this really broke a lot of
things.
Aaron had been working on trying to get knfsd operational on mx1. We
need it because it supports file system locking and the regular nfsd doesn't
and this causes problems with some mailers.
He wasn't successful but when he put the old daemons back, he put some
non-functional daemons back instead of the ones that were working. The
functional daemons appear to have been deleted.
I had to restore these from tape which is what took so long to get that
machine operational again.
Everything appears to be back to normal, Eric is here and I'm going to
attempt to get some sleep.