[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Virtual Domain Problems
- To: outages-list@eskimo.com
- Subject: Virtual Domain Problems
- From: Robert Dinse <nanook@eskimo.com>
- Date: Wed, 17 Nov 1999 09:23:54 -0800 (PST)
- cc: earthcng@eskimo.com
- Resent-Date: Wed, 17 Nov 1999 10:03:32 -0800 (PST)
- Resent-From: outages-list@eskimo.com
- Resent-Message-ID: <"fuKcv3.0.vd3.ftkCu"@mx2>
- Resent-Sender: outages-list-request@eskimo.com
I had a call at 7am regarding virtual domains not responding. I found
that the server was at 100% CPU; a bunch of stuck 'swish' processes running.
One of the consequences of allowing users to put their own cgi scripts
on-line without review is that occasionally they will do stupid things that
exhaust system resources.
In the case of swish, users that run close to their quota limits may
schedual cron jobs to build the swish database nightly. If they exceed their
quota during the swish database build, the result is a truncated swish database
file.
When this happens; if someone does a search the swish process that is
invoked attempts to seek past the end of the file which fails. It retries
indefinitely until manually killed. If the user is impatient and keeps hitting
the search button, these processes stack up and eventually bog the server down
into non-functionality.
To prevent further occurances of this particular problem I have added the
following to suexec, a program which is used to launch cgi scripts:
/*
* Place limit on CPU consumption.
*/
rl.rlim_cur = 30;
rl.rlim_max = 30;
setrlimit(RLIMIT_CPU, &rl);
What this does is limit any CGI script to 30 seconds of CPU time, so for
example when someone tries a swish search with a broken database file, instead
of looping indefinitely, it will abort after 30 seconds, preventing processes
from stacking up and killing the server.
On rare occasions I've also seen Apache get stuck and so I've added the
code to apache, in httpd_main, in the function child_sub_main, I've added:
/*
* Set limit on CPU utilization by child process
*/
rl.rlim_cur = 60;
rl.rlim_max = 60;
setrlimit(RLIMIT_CPU, &rl);
This limits children to 60 seconds of CPU, which given the setting of
MaxRequestsPerChild should be sufficient that this value would not, under
normal circumstances, be exceeded, but will prevent a run-away process from
bogging the server for long periods of time.
If a child should exceed this threshold, it will die and that request will
be aborted; but the parent process will respawn additional child processes as
required to service requests.