Eskimo North

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Eskimo / WWW

     This morning around 2:51 AM Eskimo stopped talking to the other machines
for reasons unknown at this point.  Shortly thereafter it rebooted.

     I got a page earlier about WWW being down but I checked all the machines
and everything appeared to be up and running.  What I did not realize at that
time was that although the machine was up and running, Apache was running, it
was not serving web pages because NFS (Networked File System) did not properly
recover after eskimo's reboot.

     I got a second page around 7:45 and actually tried it and discovered this
problem.  It's all working now.

     Linux userland nfsd sometimes does not seem to recover when the file
system on another server temporarily goes away.  This was really bad in 2.0.x
kernels, better in 2.2.x but it still happens, and the kernel nfsd seems to be
better still in this regard.  Even though nfsd is supposed to be a standard,
there seems to be some descrepencies between the different implimentations.

     We are working on moving the big web server (and a larger portion of the
virtual domains) to an Ultra system (64-bit), primarily because of stability
problems with the 32-bit machines, but this will also involve a change to
knfsd, so that should help with this problem as well.