[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Virtual Domains, Shell Services, etc...
- To: outages-list@eskimo.com
- Subject: Virtual Domains, Shell Services, etc...
- From: Robert Dinse <nanook@eskimo.com>
- Date: Tue, 5 Sep 2000 21:24:05 -0700 (PDT)
- Newsgroups: announcements, lobby
- Reply-To: Robert Dinse <nanook@eskimo.com>
- Resent-Date: Tue, 5 Sep 2000 21:24:13 -0700
- Resent-From: outages-list@eskimo.com
- Resent-Message-ID: <"Ro48e2.0.qC7.iRSjv"@mx1>
- Resent-Sender: outages-list-request@eskimo.com
This morning, just before 9am, Eskimo crashed as the result of a long
standing bug in the FDDI card drivers. Eskimo contains all the user files
including the web pages, authenticates the low speed connections and handles
shell services.
Someone called to let us know their virtual domain wasn't responding.
Catherine, rebooted www2 which services most of the virtual domains before
Eskimo had finished booting. Eskimo doesn't boot really fast because it has 26
file systems to potentially check and mount.
Linux NFS will not keep trying if a server is unavailable, and so the
file systems containing the web pages did not get mounted.
NFS is Networked File System, it is the way files on one machine are made
available on multiple machines.
Around 10:30, she called and woke me up and we got it all straightened
out.
Linux NFS has been a major thorn for some time and has prevented me from
doing some things that would really reduce the load and improve performance.
If anybody knows of any 3rd party NFS solutions that have these problems fixed,
I'd be interested in knowing about it.
There are two versions of Linux NFS; the original "userland" NFS daemon
that runs as an ordinary user process, and the "kernel" NFS daemon that runs
in kernel space. Both have problems.
The userland nfsd does not support file locking, which is highly necessary
in a multi-user multi-tasking environment to prevent two or more processes
(executing programs) from accessing the same file at the same time and
corrupting it.
The kernel nfsd has a permissions bug whereby if the directory containing
a file you have write permission to, does not itself give you write
permissions, then you can not write that file.
Both versions do not recover properly when an NFS server goes away and
then later recovers. SunOS by contrast will keep trying if you "hard" mount
the file system, until the server recovers. SunOS mount will keep trying until
the server it is trying to mount from becomes available. So the kind of nfs
snafu's that messed up web service this morning don't happen with SunOS alone.
All in all, Linux is a good OS; but this nfs problem is really a major
stumbling block.