The non-technical version: Eskinews has flaws on one of its disks and the utilities provided in Linux for dealing with them don't work. The lengthy downtime was the result of these errors and various attempts I made to lock out bad portions of the disk with utilities that don't work. Because it is not possible to lock out the bad parts of the disk it is going to be necessary to replace the disk and will lose the current news on the server as a result. The more technical version: One of the spool drives on Eskinews has about 16 media flaws. If this had been running SunOS it wouldn't have been a major problem, fire up format, lock out the bad blocks, and life goes on. But SunOS supports a maximum partition size of 2gb which is insufficient for News, and does not support disk stripping which is necessary to provide adequate disk I/O capacity for Usenet News processing at current volumes. Linux rather than using the drives bad block list, assigns an inode for bad blocks and basically creates an invisible file of bad blocks. Where this scheme fails miserably is if those blocks happen to be inode blocks instead of file storage blocks. If the blocks are inodes, rather than relocate them, or simply mark those inodes as unusable, the utility used to map out bad blocks and fix the file system (e2fsck) finds that it no longer has room for the inodes, freaks, and starts over from the beginning, and basically it will go in this loop indefinitely. So the bottom line, with SCSI-II disks, there is no way to lock out a bad block under Linux. A SCSI-III disk maintains a bad block list and reserves alternate tracks internally and handles bad block reassignments internally but the disks we presently have for spool drives do not. I'm going to have to replace this drive on Eskinews. When we added a spool drive, just restoring 3gb of news spool files took three days. At the current rate of more than 2gb of news/day, restoring 5 days worth of news would take well over a week so basically there is no point to doing it. The spool will have to be zeroed in the process of replacing the drive. It took me as long as it did to arrive at the conclusion that I couldn't lock out these blocks because each iteration of fsck on the spool partition takes about an hour owing to the size of that partition (17gb) and the number of directories and files (more than a million). So at this point I am going to pick up a Western Digital replacement which is also SCSI-III and maintains it's own bad block list and replace the existing drive as soon as I can get a replacement. There is some additional instability involving INND just spontaneously dying and I think that is related to the recent kernel upgrade because it started right after we upgraded the kernel. I have put the machine back on the old drive to test the theory.