|
|
Some suggestions for how to organize shared library code to improve performance are presented here. They apply to paging systems.
The non-shared C library contains several diverse groups of functions. Many processes use different combinations of these groups, making the paging behavior of any shared C library difficult to predict. A shared library should offer greater benefits for more homogeneous collections of code. For example, a database library probably could be organized to reduce system paging substantially, if its static and dynamic calling dependencies were more predictable.
First, profile the code that might go into the shared library (see prof(CP)).
Based on profiling information, make some decisions about what to include in the shared library. a.out file size is a static property, and paging is a dynamic property. These static and dynamic characteristics may conflict, so you have to decide whether the performance lost is worth the disk space gained. See ``Choosing library members''. for more information.
Try to improve locality of reference by grouping dynamically related functions. If every call of funcA generates calls to funcB and funcC, try to put them in the same page. cflow(CP) (documented in the Programmer's Reference) generates this static dependency information. Combine it with profiling to see what things actually are called, as opposed to what things might be called.
Arrange the shared library target's object files
so that frequently used functions do not unnecessarily cross page boundaries.
When arranging object files within the target library,
be sure to keep the text and data files separate.
You can reorder text object files without breaking compatibility;
the same is not true for object files that define global data.
Use name lists and disassemblies of the
shared library target file,
to determine where the page boundaries fall.
After grouping related functions, break them into page-sized chunks. Although some object files and functions are larger than a single page, most of them are smaller. Use the infrequently called functions as glue between the chunks. Because the glue between pages is referenced less frequently than the page contents, the probability of a page fault decreases.
After determining the branch table, arrange the library's object files without breaking compatibility. Put frequently used, unrelated functions together because they probably will be called randomly enough to keep the pages in memory. System calls go into another page as a group, and so on. The following example shows how to change the order of the C library's object files:
Before After#objects #objects ... ... printf.o strcmp.o fopen.o malloc.o malloc.o printf.o strcmp.o fopen.o ... ...
Improved performance by arranging the typical process
to avoid cache entry conflicts.
If a heavily used library had both its text and its data
segment mapped to the same cache entry, the
performance penalty would be particularly severe.
Every library instruction would bring the
text segment information into the cache.
Instructions that referenced data would flush the
entry to load the data segment.