======================================== Notes for Lecture 23 - April 19, 2007 ======================================== * Storage management ** Byte alignment *** Motivation Some processors require, either for efficiency or to work at all, that multi-byte operands start at memory addresses that are divisible by some alignment constant, generally a small power of 2 like 2, 4, 8, or 16. For example, it might be required that a 4-byte int start at an address that is a multiple of 4. *** Byte alignment effects in C The C compiler generall arranges to have byte alignment restrictions automatically satisfied. This manifests itself in several places. **** malloc() malloc() always returns a pointer that satisfies all of the alignment conditions since it doesn't know what the block will later be used for. On my x86_64 architecture AMD processor, malloc() always allocates storage in multiples of 16 bytes, with the smallest possible storage block being 32. The empirical formula for the amount of storage consumed by malloc( n ) is min( 32, 16 * ceiling( (n+8)/16 ) ) Thus, if n=24, ceiling( (n+8)/16 ) = 2, so 32 bytes are consumed. If n=25, ceiling( (n+8)/16 ) = 3, so 48 bytes are consumed. The reason for the extra 8 bytes is apparently overhead to manage the blocks. I do not know why the minimum block size seems to be 32 instead of 16. **** struct padding Padding is added between fields of structures and after the last field to ensure that each field has proper byte alignment, both in case of a single instance of a structure and for an array of such structures. For example, struct foo1 { char c; int x; } has size 8. c begins at the first byte of the structure, x begins at the 4th byte. There are 3 padding bytes between c and x. This gives x a 4-byte alignment (i.e., its address is a multiple of 4). struct foo2 { char c; int x; char d; } has size 12. 3 bytes of padding follow both c and d. In an array of struct foo2, the int field x is always 4-byte aligned. Rearranging the fields yields a more compact structure: struct foo3 { char c; char d; int x; } This has size 8. There are two padding bytes following d. ** String store A string store gives a more efficient way of storing strings of variable length such as one might want when reading a dictionary into memory. The general idea is to use malloc() to get large blocks of storage and then to subdivide that block into small pieces as needed. *** Method malloc() storage in large blocks called "pools". Allocate variable-length blocks of bytes on demand from pool. When pool fills, malloc another. Note: Can't use realloc to expand pool because string pointers point into old pool, and realloc might move data. *** Properties Good for allocating many small strings of varying size. Easily freed as a unit but not individually. Less time and space overhead than malloc(). ** Fixed-size block allocator *** Method Allocates fixed size blocks from pool. Keeps freed blocks on free list. Acquires new pool when needed. *** Properties Efficient New pool only allocated when existing pools are 100% full. *** Restrictions Only works with one block size (although several allocators for different block sizes could be used together). ** Buddy system storage allocator One method for storage allocation is the buddy system. The idea here is to allocate blocks in sizes that are powers of two. If the user requests another block size, it is rounded up to the next power of two. The unused space within that block is called internal fragmentation. It is unavailable for use until the block is eventually freed. A free list is maintained for each power of 2 sized block. For example, if the minimum block size is 2^3 and the maximum size is 2^32, then there would be 30 free lists in all. If a block of size 2^k is requested, the request is satisfied by allocating the first block on free list k. If free list k is empty, a block of size 2^(k+1) is obtained and split into two size 2^k blocks. One of these is put on free list k and the other is allocated to the user. This processes is applied recursively to obtain the block of size 2^(k+1), which either comes from free list k+1 or by obtaining a block of size 2^(k+2) and splitting it. When a block of size 2^k is freed, instead of just putting it on free list k, an attempt is made to reunite it with its "buddy" -- the block of size 2^(k+1) from which it was originally obtained. First it must locate the buddy. Then it must determine if the buddy is free or not. Here's one way of doing this. Imagine the blocks are arranged in a binary tree. At the root is a big block of the maximum block size 2^m. Below it are the two buddies of size 2^(m-1) that would result if it were split in half. Below each of those blocks are two blocks of size 2^(m-2) which would be the buddies resulting from splitting them. We thus get a complete binary tree, where the nodes at level j are all possible blocks of size 2^(m-j). Each block in this tree can be named by the sequence of 0's and 1's that leads to it. The root node is named by the empty sequence. It's sons are named 0 and 1, respectively. Their sons are 00, 01, 10, and 11, etc. With this naming scheme, two nodes are buddies if and only if their names are x0 and x1 for some sequence x. If we can find the name of a block, then we can find the name of its buddy by complementing the last bit of its name. Assume the blocks are allocated from a byte array M of size 2^m. I'll now describe how to find the name of a block given by its subscript j in M and its block size exponent k. The root block has exponent m and begins at M[0]. It's two sons each have size 2^(m-1) and begin at M[0] and M[2^(m-1)], respectively. The nodes at level 2 in the tree have size 2^(m-2) and begin at M[0], M[2^(m-2)], M[2*2^(m-2)], and M[3*2^(m-2)], respectively. Suppose we are given a block with size exponent k that begins at M[j]. Then j must be a multiple of 2^k, so if we write j in binary, the last k bits are all 0. For example if k=3, then blocks of size 8 begin at M[0], M[8], M[16], M[24], M[32], M[40], etc. Written in binary, these numbers are: 0 000 000 8 001 000 16 010 000 24 011 000 32 100 000 40 101 000 (I've put a space separating groups of 3 bits for readability.) Notice now that the sequence node name described above is just its subscript when shifted right by k places. If M[j] is a block of size 2^k, then the subscript of its buddy is obtained by complementing bit number k from the right end, where the rightmost bit is numbered 0. Hence, the subscript j' of its buddy is given by the C bit expression (j^(1<