Eskimo North

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Web Server Outage / Databases / Files

     On friday at approximately 9:45 PM the web server stopped responding.
Attempting to execute a command resulted in "can not execute binary".

     I attempted a reboot, the machine went down entirely and would not come

     Upon arriving at the co-location facility; I saw a couple of other people
working on equipment there and part of the room was dark.  They had either
overloaded or shorted power in their cabinet causing the breaker for a portion
of the area to trip.

     Our cabinet still had power but they share a common UPS so that might have
glitched the power and caused the malfunction, or not; there are at least two
other possibilities.

     I replaced phpBB2 with phpBB3 on the Eskimo North site last weekend.
Spammers that had been saturating it with graphical porn spams, casino spams,
drug spams, were no longer able to use the method they had been using to get
around security and were apparently very angered by this.

     We have been under a DoS attack most of the week following, in fact, still
it is occuring tonight.  A DoS attack during a heavy traffic time can crash the
machine by exhausting memory. The machine has 2GB of RAM, needs more but that
is all the motherboard will accomodate.

     Because web traffic has increased from around a million hits/day when this
machine was first placed into service, to between 2.5 and 4 million hits/day
currently, and because the mixture of traffic has gone from mostly small static
pages to a more significant mix of dynamic and larger pages, the machine was
already approaching the point where it needed replacement for capacity reasons

     The system also has a quirky disk controller chip that has a propensity
for wild disk writes if it ever receives a command it can't handle.  For
example, if it's told to use too big of a buffer, it scribbles the disk, so
when the machine is unstable this is a really bad thing.

     Lastly, when I restored this machine from tape; the Linux kernel was
corrupted, and I got no tape read errors or disk write errors during the
restore which suggests strongly that the kernel was already corrupted when the
July backup was written, so that may have been the cause of the crash.

     And this brings me to another point, when the machine crashes klog never
successfully logs anything, and the screen saver has almost always blanked the
screen so I can't tell what caused the crash usually.  If someone knows how to
disable the screen saver on a Linux system when nobody is logged in, that
information would be helpful.

     So I am looking for a replacement for this machine, one that can
accomodate at least 8GB on the mainboard, has healthy I/O and CPU capability,
and can run stable.  I don't know if I'm going to go with Sun or Intel
architecture yet, there are advantages and disadvantages to both.  High end Sun
equipment is expensive relative to PC architecture, but would maintain binary
compatibility allowing a change over without recompiling everything.  We'd get
a lot more bang for the buck though if we go with Intel architecture.

     Most of the databases were recoverable, as far as I know the only database
that I was unable to recover were the tables for my Coppermine gallery, and my
sons, Carl's, Coppermine gallery.  Coppermine has no built-in backup tools and
apparently uses the database in such a way that bad things tend to happen when
the system is halted unexpectantly.

     To counter this I've written a script that does a mysqldump of all of the
databases to disk on a different machine every morning at 5AM.

     Three individuals who's home directories had been moved to this server, I
apologize but your directory is now restored from July 1'ish tape, again.

 Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting.
   Knowledgable human assistance, not telephone trees or script readers.
 See our web site: (206) 812-0051 or (800) 246-6874.