[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
- To: email@example.com
- Subject: Outage...
- From: Nanook <firstname.lastname@example.org>
- Date: Wed, 30 Jun 2004 04:00:25 -0700 (PDT)
- List-help: <mailto:email@example.com?subject=help>
- List-post: <mailto:firstname.lastname@example.org>
- List-subscribe: <mailto:email@example.com?subject=subscribe>
- List-unsubscribe: <mailto:firstname.lastname@example.org?subject=unsubscribe>
- Newsgroups: lobby,announcements,linux.redhat.sparc
- Resent-date: Wed, 30 Jun 2004 04:00:29 -0700
- Resent-from: email@example.com
- Resent-message-id: <m0yjp.A.I-C.N1p4AB@ultra1.eskimo.com>
- Resent-sender: firstname.lastname@example.org
Until recently, Linux on an Ultra2 has been a very stable platform for us. Several months ago, I added a second CPU to the main file server and it became unstable. So before we moved the servers to the co-location facility I removed the second CPU and went back to running it as a single CPU box. We put together an entirely new file server, also based on an Ultra2 Creator 3D machine, pretty much identical except for faster larger drives, and it locked up today with no errors printed. This file server has user directories and the mail spool files. It is central to many services functioning properly. Many of the services we can achieve redundancy by setting up multiple servers, but this function I haven't found an easy way to make totally redundant and at the same time it is central to almost everything else. There is a network file system called Coda that looks like it could do this, allow the file system to use several redundant servers, at present it's not usable though until we get rid of the remaining SunOS boxes, and even then I don't know if things like special files (named pipes or fifos, sockets, etc) are supported, if the Unix permissions symantics are supported, etc. So I'm interested in other solutions people may know of. Also, in the interest of having the existing machine recover from a crash gracefully, I've already made some changes to the initialization scripts that should help but I'd like to know about watchdog timers. The kernel configuration menu has a software watchdog timer as an option, but the documentation doesn't seem to be very helpful. Also, the prom configuration for the box has an item, "Watchdog-Reboot?", which implies it's got a hardware watchdog timer. Anybody have any advice on using either the software watchdog timer or presumably a hardware timer? The documentation in the kernel source tree for the software watchdog shows how to use /dev/watchdog, only the device does not exist, the MAKEDEV script supplied with RedHat 6.2 doesn't know about it, and I don't know where to find the major and minor device numbers. I also can't help but wonder if the machine hangs, if that's going to "hang" the software watchdog timer as well and would prefer a hardware solution that will work even if the CPU is locked up tight. Lastly I'm looking for a cost-effective box that will allow us to remotely power cycle machines and gain console access via serial ports, around 8 ports. I've found such items but they've been donate-a-limb priced.