======================================== Notes for Lecture 17 - March 27, 2008 ======================================== * Linear data structures ** Edit buffer ADT (text, chapter 9) NewBuffer() FreeBuffer() MoveCursorForward() MoveCursorBackward() MoveCursorToStart() MoveCursorToEnd() InsertCharacter() DeleteCharacter() DisplayBuffer() ** Array implementation struct bufferCDT { char text[MaxBuffer]; int length; int cursor; }; *** Advantages Cursor motion functions simple and constant time *** Disadvantages Insert and delete both Theta(n) operations Has predetermined max size. (Could overcome with datapack techniques.) ** Two stack implementation Consists of two stacks -- before and after. *** Advantages CursorForward, CursorBackward, InsertCharacter, DeleteCharacter constant time *** Disadvantages CursorToStart, CursorToEnd Theta(n) ** Linked list with dummy header implementation struct bufferCDT { cellT *start; cellT *cursor; }; *** Dummy header cell Convenient to have start point to a dummy cell, and dummy's next (link) pointer points to the chain of cells comprising the list. Cursor points to the cell containing the character immediately to left of cursor. *** Advantages CursorForward, InsertCharacter, DeleteCharacter constant time *** Disadvantages CursorToEnd and CursorBackward both expensive. ** Circular doubly linked list with dummy header implementation Each cell has a next and a prev pointer. Last element points to dummy. Dummy's prev points to last element. *** Advantages All operations constant time (except freeing a list) *** Disadvantages More memory, more time for simple operations. * Symbol table (a.k.a. dictionary) Set of (key, value) pairs. ** See demo-17/1-symtab and demo-17/2-symtab-generic ** Basic ADT Symtab newSymtab( void ) void freeSymtab( Symtab ) void insertSymtab( Symtab table, const char* key, const char* value ) void* lookupSymtab( Symtab table, const char* key ) ** Hash table implementation *** Constants #define NBuckets 101 *** Types typedef struct cell { char* key; void* value; struct cell* next; }* Cell; struct symtab { Cell bucket[ NBuckets ]; } *** Hash function #define Multiplier -1664117991L static int hashString( const char* s, int nBuckets ) { unsigned long hashcode = 0; for ( int i=0; s[i] != '\0'; i++ ) { hashcode = hashcode * Multiplier + s[i]; } return (hashcode % nBuckets); } *** Result of hash value as defined by C99 standard Let n be length of unsigned long (in bits). Then Multiplier is converted to the long unsigned to which it is congruent mod 2^n, and the resulting unsigned multiplication is performed mod 2^n. The addition of the charater s[i] depends on whether char is signed or unsigned. If it is signed, it is first converted to a long int and then to an unsigned long int. A negative char becomes a very big unsigned. If char is unsigned, then it converts to an unsigned long of the same (non-negative) value. ** Issues *** Storage ownership **** In 1-symtab example Key and value are both copied. Symtab owns the copies; user owns the originals. **** In 2-symtab-generic example Ownership of the user-supplied object passes to the symtab. User supplies a "freeElement" function that knows how to free the user object. User must make copy for symtab if ownership of original is to be retained. *** Special return from lookupSymtab() to indicate not found **** In 1-symtab example NULL is returned. **** In 2-symtab example NOT_FOUND is returned. NOT_FOUND is a pointer constant whose value is different from NULL and different from any pointer to a user object. It is defined in symtab.c by the following lines: static char dummy; const void* NOT_FOUND = &dummy;