Automating frequent tasks

Shortening data files

There are two good reasons for using short files (less than 10,000 bytes, if possible; certainly less than a quarter of a megabyte). Firstly, the traditional UNIX filesystems access short files faster than long files. Significant overheads are incurred in reading or writing to a file that is, in the first instance, more than 10KB long, and in the second instance, more than 256KB long (or, in an extreme case, more than 64MB long). With each successive increase in size, the process of reading from or writing to the file becomes slower; therefore short files are preferred.

In addition, the performance of some programs degrades significantly as their input files increase in size. Any complex sorting or comparison operation (using sort or diff) usually takes significantly longer to perform on a single large file than on two smaller files containing the same amount of information. This degradation is an unavoidable consequence of the nature of the problem these programs are dealing with and can rarely be worked around, although it is not significant when working with short files.