P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 AM, Friday, 7 October 2016 CS-223 Homework #3 Call Me - mnemonic phone numbers REMINDER: Do not under any circumstances copy another student's code or give a copy of your code to another student. After discussing the assignment with another student, you may not take any written or electronic record away. Moreover, you must engage in a full hour of mind-numbing activity before you work on it again. Such discussions must be noted in your log file. There are three things that happen as you get older. The first is that your memory goes. I forget the other two. (I may have said this in class, but I can't be sure.) Businesses often cater to the frailty of our memories by providing mnemonics for their telephone numbers. For example, you can call United Parcel Services at 800-PICK-UPS - which actually has two reinforcing meanings. In the old days, you would often call the operator to get phone information: https://www.youtube.com/watch?v=gcVq_NXwzJY In this assignment you will implement a program to find a word that corresponds to a given positive integer, as well as convert a word to its corresponding phone number. (40) Write a program "Callme" that processes command line arguments specifying the number to encode as a word or the word to convert to a number. Write a program Callme (digits | letters) [-debug]? that processes the command line arguments as follows: number: Finds one (actually, zero) or more words that match this number. word: Converts word into phone number with the same number of digits. -debug: optional debug flag that will print out hashing information All error output (usage and "Fatal Error" messages below) should be printed to standard error. For example, fprintf(stderr, "usage ..."); All other output should be printed to standard output. (Use normal printf.) Examples: % Callme usage: Callme (digits | letters) [-debug]? % Callme pickups alphabetic: pickups => 7425877 % Callme 7425877 numeric: 7425877 => pickups % Callme 7245677 numeric: 7245677 => sailors schloss % Callme 7245677 -debug Loading dictionary Growing to size: 2048. n: 1024. Used buckets: 790. Occupancy rate: 0.77 Growing to size: 4096. n: 2048. Used buckets: 1557. Occupancy rate: 0.76 Growing to size: 8192. n: 4096. Used buckets: 3117. Occupancy rate: 0.76 Growing to size: 16384. n: 8192. Used buckets: 6205. Occupancy rate: 0.76 Growing to size: 32768. n: 16384. Used buckets: 12495. Occupancy rate: 0.76 Growing to size: 65536. n: 32768. Used buckets: 25011. Occupancy rate: 0.76 Word Count: 53944 numeric: 7245677 => sailors schloss % Callme sailors -debug alphabetic: sailors => 7245677 % Callme 4444444 numeric: 4444444 => ** no matches ** % Callme givemeheaven alphabetic: givemeheaven => 448363432836 Callme should: * Use the online dictionary: /usr/share/dict/words * Implement a hash table, based on the header file: hash.h, which is provided in /c/cs223/hw3/hash.h You will write hash.c which implements the functions declared in hash.h. Your hash table should used linked lists to handle collisions. Note that the table expands as needed. Your hash table does not need to handle deletions. The file hash.h includes the following: #define INITIAL_SIZE (1024) #define GROWTH_FACTOR (2) #define MAX_LOAD_FACTOR (1) The initial size of the hash table should be 1024 buckets. The hash table keeps track of how many elements it contains, say n. Once n * MAX_LOAD_FACTOR == SIZE, then the table expands. The size of the new table will be SIZE * GROWTH_FACTOR. The table also keeps track of how many buckets are used - that is, do not contain the NULL pointer. If the -debug flag is on, the following message from hash.c will be printed to standard output everytime the table expands: Growing to size: 2048. n: 1024. Used buckets: 790. Occupancy rate: 0.77 where size (2048) is the size of the new expanded table, n (1024) is the number of elements in the old table, used buckets (799) indicates the number of buckets that do not contain NULL, occupancy rate (0.77) is simply buckets / n, which reflects the quality of your hash function. A low occupancy rate (< .5) indicates a bad hash function. You can lose points for a bad hash function. The -debug flag also triggers the following output from Callme.c Loading dictionary // the program is reading the online dictionary. Word Count: 53944 // how many words got copied to the hashtable. Note that Callme does NOT load the entire dictionary -- only words of the same length as the input number. Also, if the command line argument is a word instead of a number, -debug has no effect. The dictionary should not be loaded and the hashtable is not created, much less expanded. * There is a separate program, /c/cs223/hw3/hashtest.c, which uses this hash table implementation. Your hash.c code should work with hashtest.c. We may also use other programs to test your implementation of hash.c. Note that hashtest.c calls HashDisplay() which is likely never called in Callme.c. * There is also bighashtest.c which populates the hash table with a 1000 or so random words and then displays the hash table. This gives you an idea of the kind of even distribution you want. * Have no memory leaks. You will need to use dynamic memory allocation, e.g., malloc(). You want to make sure that you free up memory before termination. Use valgrind to detect any memory problems. It will also detect other kinds of memory lapses, such as reading or writing to unauthorized parts of memory. * Fail "gracefully" (i.e., neither go into an infinite loop nor cause a memory dump) if any of the assumptions above is violated. A hash table is a type of mapping function. (MAP: M as in Mnemonic, A as in Aisle, P as in Pneumatic.) Reading: You should read the Van Wyk chapters 5 (Linked List) and 8 (Hashing). In addition, review the relevant Aspnes sections on linked lists, hashing, valgrind, and malloc. Use the submit command (see below) to turn in the source file(s) for Callme, a Makefile, and your log file (see below). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE-HALF HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes ===== 1. When available, the public grading script will be /c/cs223/hw3/Tests/test.Callme (and my solution will be /c/cs223/hw3/Callmex). To run it, type % /c/cs223/hw3/Tests/test.Callme (here % is the shell prompt). The script uses make to create Callme. To run each test it redirects the test file (e.g., /c/cs223/hw3/Tests/t01.c for Test #01) to the standard input of Callme and redirects the standard output to a temporary file. Then it compares this file with the expected output for that input (e.g., /c/cs223/hw3/Tests/t01.cs for Test #01). Your program passes the test only if the two files are identical. To run your program on the file for Test #01, type % ./Callme < /c/cs223/hw3/Tests/t01.c To compare the output from your program with the expected output, type % ./Callme < /c/cs223/hw3/Tests/t01.c | cmp - /c/cs223/hw3/Tests/t01.cs (cmp outputs the first character where the files differ) or % ./Callme < /c/cs223/hw3/Tests/t01.c | diff - /c/cs223/hw3/Tests/t01.cs (diff outputs the lines where they differ but uses a looser definition for "identical") or % /c/cs223/hw3/Tests/test.Callme 01 (you may specify more than one test here). If your output looks the same as what is expected, but your program still fails the test, there are probably some invisible characters in your output. To make all characters visible (except blanks), type % ./Callme < /c/cs223/hw3/Tests/t01.c | cat -vet or % ./Callme < /c/cs223/hw3/Tests/t01.c | od -bc 2. Keep track of how you spend your time in completing this assignment. Your log file should be of the general form (that below is fictitious): ESTIMATE of time to complete assignment: 10 hours Time Time Date Started Spent Work completed ---- ------- ---- -------------- 1/13 10:15pm 0:45 Read assignment and relevant material in K&R 1/16 4:45pm 1:15 Sketched solution using a finite-state machine with one-character look-ahead 1/19 9:00am 2:20 Wrote the program and eliminated compile-time errors; code passes eight tests 1/20 7:05pm 2:00 Discovered and corrected two logical errors; code now passes eleven tests 1/23 11:00am 1:35 Finished debugging; program passes all public tests ---- 7:55 TOTAL time spent I discussed my solution with: Peter Salovey, Ben Polak, Tamar Gendler, and Jonathan Holloway (and watched four episodes of The Simpsons). but MUST contain * your estimate of the time required (made prior to writing any code), * the total time you actually spent on the assignment, * the names of all others (but not members of the teaching staff) with whom you discussed the assignment for more than 10 minutes, and * a brief discussion (100 words MINIMUM) of the major conceptual and coding difficulties that you encountered in developing and debugging the program (and there will always be some). This log will generally be worth 5-10% of the total grade. N.B. To facilitate analysis, the log file MUST be the only file submitted whose name contains the string "log" and the estimate / total MUST be on the only line in that file that contains the string "ESTIMATE" / "TOTAL". 3. The submit program can be invoked in eight different ways: % /c/cs223/bin/submit 3 Makefile Callme.c hash.c hash.h time.log submits the named source files as your solution to Homework #3; Note that even though you will submit hash.h, you are not allowed to change the definitions. You may add comments, such as your name. You should assume that your hash table implementation may be tested independently of Callme. % /c/cs223/bin/check 3 lists the files that you submitted for Homework #3; % /c/cs223/bin/unsubmit 3 error.submit bogus.solution deletes the named files that you submitted previously for Homework #3 (which is useful if you rename a file or accidentally submit the wrong one); % /c/cs223/bin/makeit 4 Callme runs "make" on the files that you submitted previously for Homework #4; % /c/cs223/bin/testit 5 Callme runs the public test script for Callme using the files that you submitted previously for Homework #5; % /c/cs223/bin/protect 6 Callme.c time.log protects the named files that you submitted previously for Homework #6 (so they cannot be deleted accidentally); % /c/cs223/bin/unprotect 7 util.c time.log unprotects the named files that you submitted previously for Homework #7 (so they can be deleted); and % /c/cs223/bin/retrieve 8 common.c time.log and % /c/cs223/bin/retrieve 8 -d"2016/01/21 20:00" hash.c retrieve copies of the named files that you submitted previously for Homework #8 (in case you accidentally delete your own copies). The day and hour are optional and request the latest submission prior to that time (see the -d flag under "man co" for how to specify times). 4. When assignments are style graded, EVERY source file found in the submit directory will be reviewed. Thus prudence suggests using unsubmit to remove a file from the directory when you change its name or it ceases to be part of your solution. See http://zoo.cs.yale.edu/classes/cs223/doc/Style In your spare time, you might think about how to automate the tests in the online style sheet. That would be a pretty good homework assignment. 5. Prudence (and a 5-point penalty for code that does not make) suggests that you run makeit ("makeit 2 Callme") after you have submitted the final version of your source files. Better yet, run testit ("testit 2 Callme"). CS-223-09/20/16