Optimizing Linux File Systems
I utilize the ext4 file system here at Eskimo. ReiserFS has some advantages when dealing with very small files, xfs has some advantages with very large files, overall ext4 is efficient, mature, and stable.
For large systems I prefer multiple partions, each optimized to it's function. Even with a journaling file system, occasionally a traditional fsck will be necessary. The duration of the fsck is related to the number of inodes on the file system. An inode is a location on the disk associated with a file or directory that points to the blocks that contain the data for that file.
Not having enough inodes can leave you in a situation where you have disk space but no way to allocate it to a new file because there are no available inodes. There is no way to add inodes to an existing file system. You have to be sure that you have enough. The number mke2fs chooses is excessive for nearly any application.
I use a rescue disk, another Linux box, or another drive with Linux already on it, to create the partitions before I start the install rather than allowing Anaconda to do use it's defaults. Some other Linux distributions such as OpenSUSE give you more control during the install process. This is one area where CentOS 6 falls short.
I create partitions for different things according to their needs and create file systems on them before running install. Then I use the create custom layout option in Anaconda and just assign the already created partitions to the desired mount points and choose NOT to format them again. Generally I will create partitions something like this:
- I make this the first partition, usually about 200MB though on a large server, I will make it larger, say 1GB. I will force it to be a primary partition as well.
- I make the swap partition the second partition. Modern drives have a variable number of sectors per track. There are few sectors on the inner tracks and more on the outer tracks. This means that for every revolution more sectors go under the heads on these outer tracks. As a result the transfer rate is much better on outer tracks and fewer seeks are required to access the same amount of data as well. I put swap on these outer tracks where speed is the best it can be. I force swap also to be a primary partition.
- I put the root partition next. I generally make it around 5GB. I force it to be a primary partition as well.
- I make this partition anywhere from 5GB - 30GB depending on what the machines function is. If it is a busy web or mail server, I'll opt for a larger /var partition because those services generate a lot of log data in a day. If it's something that generates little logging then I'll opt for the smaller paritition size. Note that CentOS 6 by default places the storage for virtual guests in this partition. I don't because I do not want my guests to be corrupted and the constant logging data makes that a strong possibility in the event of a crash or power outage.
- This is where I put my guests storage. It is also the default space for web pages under CentOS. Size this partition accordingly. On Iglulik where I have six virtual machines, some of them fairly sizable, this is a 500GB partition.
- I make this between about 5GB and 30GB depending entirely on how much stuff is going to be installed on the machine. Since binaries and libraries mostly go here, if you have a lot of applications you'll need more space.
- Here we have one /home partition that holds all of the users files nfs mounted across all of the servers. To prevent certain types of abuse, I do not allow devices or suid executables on this partition. This is the partition where your users home directories will be. Size it accordingly.
Creating File Systems
I use mke2fs to create file systems on the partitions I have created. By doing it by hand in advance of the install, I can choose the parameters I prefer.
As I mentioned previously, the default number of inodes that CentOS chooses is overkill for almost any application. If you have existing servers running you can use df and df -i to get the number of blocks and inodes in use for a given partition. As an example:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda3 50395844 702612 47133232 2% /
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 3203072 15080 3187992 1% /
702612 1K blocks = 719,474,688 bytes in use. If we divide that number by the number of inodes in use, 15080, we get 47,710 bytes / inode used on average in the root partition. This is more than ten times the mke2fs default of creating an inode for every 4096 bytes. We want to allow some margin of course so a value of 32,768 bytes per inode would be reasonable in this case. Also note that at one time mke2fs would not allow more than 65,536 bytes per inode. However, this is no longer the case and hasn't been for eons. Nobody has gotten around to updating the documentation. On the /opt partition where I have very large files used for guests, a much larger number is appropriate.
I use 4K blocks because on multi-terrabyte drives. It makes sense to optimize for performance and 4K is the largest block Linux supports. Now that advanced format (4K sectors) drives are becoming common, this is something that Linux should address.
I use a directory index because the speed up of access in large directories is worth the minor increase in space the index consumes.
The man page says using extents rather than indirect inodes is more effecient, particularly for large files, so I use those. I've never benchmarked things both ways so I'm taking the authors word on this one.
Journaling is good for data integrity as well as faster boots after an improper shutdown, power failures, or crash. So I include a journal.
Sparse superblocks reduces the number of superblocks the kernel has to maintain and so improves file system effeciency.
When I create a file system usually the mke2fs command will look something like this:
mke2fs -t ext4 -b 4096 -i 32768 \ -O dir_index,extent,filetype,has_journal,sparse_super -v
I hate the current trend to "Windowize" Linux so there is my 2 cents worth on how to bypass the Windowizations and choose your own file system parameters that are optimal for your particular application and circumstances.