Eskimo North


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Partial Restoration




     Ultra1 which is an NFS server for several other machines as well as
providing the main web service has been extremely problematic over the last
couple of weeks and last night scribbled the disk badly requiring a full tape
restore.

     After restoring it, today it crashed again this time with illegal
instructions in the swapper (which is a kernel process).

     There are numerous factors causing problems with this machine, the bottom
line is it really needs a replacement and I'm looking for one.

     We have been under heavy attack for the last couple of weeks, don't really
know why but SYN floods have been a daily occurance. Last night, someone
working on equipment in the co-location facility managed to short out power and
pull down a breaker.  While that didn't take our power out it may have created
a glitch in the power that caused the machine to malfunction and scribble the
disk.

     There are really four problems with this machine that make it vunerable to
problems and why it needs to be replaced.

     First is the power supply really isn't adequate, the machine is completely
populated in terms of memory and it has high RPM drives that are also higher
capacity than it was designed for.  This draws more power and creates more
heat. With the power supply that tapped out there is no reserve to handle any
momentary line sag.

     Second, web traffic has been steadily increasing and now averages about
2.5 million hits per day, occasionally exceeding 4 million.  In addition, web
traffic has evolved from largely small static pages to a much higher percentage
of larger dynamic pages.  The size isn't so much an issue with this machine as
it has robust I/O capabilities, but the CPU required for dynamic processing is
becoming a bottleneck.

     Third, issue, in addition to CPU, and really directly related to it, the
machine can accommodate a maximum of 2GB of RAM and it is fully populated, and
this is not enough for todays traffic mix.

     So with that memory and CPU, a SYN flood can't be processed in realtime if
it is high volume because of CPU, so it backs up in memory, and then bad things
happen.  Linux on Sparc never really did handle swap well.

     So inadequate power, inadequate cooling even if it had adequate power,
insufficient memory, and insufficient CPU are all reasons this box needs to be
replaced with something more capable.  I haven't decided what just yet, input
appreciated.

     I'd stick with Sun except the CPU/dollar ratio is high on the Sparc boxes.
The thing that justified Sun in the past was the tremendous I/O capacity of
their server hardware.  But now that you can get front-bus speeds exceeding
1Ghz on a PCI bus machine, the advantages of a 584 bit memory path and crossbar
bus are becoming less important and the expense hard to justify.

     I've heard good things about the IBM servers, and since IBM had strongly
endorsed Linux and contributed to it's development, odds are good Linux will
run well on IBM hardware.  I've also been hearing good things from people
running them.  But there again name brands are always pricey.

     So anyway, I'm researching what to replace this machine with.  There is
no doubt in my mind that it needs replacement.  We can't afford these kind of
outages.

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
 Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting.
   Knowledgable human assistance, not telephone trees or script readers.
 See our web site: http://www.eskimo.com/ (206) 812-0051 or (800) 246-6874.