Eskimo North


          [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

          WWW Crash


          • To: outages-list@eskimo.com, linux-kernel@vger.rutgers.edu
          • Subject: WWW Crash
          • From: Robert Dinse <nanook@eskimo.com>
          • Date: Tue, 6 Jun 2000 07:35:18 -0700 (PDT)
          • Resent-Date: Tue, 6 Jun 2000 07:35:24 -0700
          • Resent-From: outages-list@eskimo.com
          • Resent-Message-ID: <"Y2zM41.0.5G4.hmGFv"@mx1>
          • Resent-Sender: outages-list-request@eskimo.com

          
               I experienced another of the mystery weird "lockups" of our web server
          today, 4 CPU Ross RTK-625 Hypersparc SS-10 w/ 384MB RAM. 
          
               The state the machine went into was one where it quit responding to web
          requests, ssh requests, etc, but you could still change virtual consoles and it
          would echo whatever you typed at the console.  If you input a login to the
          login prompt, it would not give a password prompt, but switching consoles will
          give another login prompt. 
          
               This started happening after "upgrading" from 2.2.14 to 2.2.15.  2.2.14
          would frequently crash due to spin_lock deadlock, that seems to be fixed, but
          now in 2.2.15, this bug, and another where the machine hangs hard and doesn't
          respond to anything except a power cycle seems to have replaced the spin_lock
          deadlock.
          
               Where the spin_lock deadlock only happened on single CPU boxes, these two
          forms of machine death occur on both multi-CPU SS-10's and single CPU LX's.
          There seems to be a pattern to the time these things happen, while it may
          happen anytime of day it is extremely common during the hours of 4am-8am
          suggesting either something launched out of cron or perhaps a malicious human
          somewhere is involved.
          
               For what it's worth, it also seems to happen only on the machines running
          Apache web server (version 1.3.3p3) and not on machines running IRC servers
          (which you would think would be favorite targets of malicious humans) or news
          or mail.
          
               Both the zombie machine syndrome and the hard lock ups are frustrating
          because they provide no errors on the console or in logs that give any clue as
          to the nature of the failure and make the machine sufficiently inoperable as to
          make it impossible to dig around and examine the state of things. 
          
          
          
          

          • Prev by Date: mail
          • Next by Date: Eskinews spin locks 6/6 3:15pm
          • Prev by thread: Eskinews spin locks 6/6 3:15pm
          • Next by thread: WWW Crash
          • Index(es):
            • Date
            • Thread