Due 1:30 PM, Wednesday, 14 February 2018 CPSC 101b Homework #3 Redundancy: Data Compression Note: The assignment is due IN CLASS on the date above. REMINDER: When discussing an assignment with other students, you may write on a board or a piece of paper, but you may not retain any written or electronic record of the discussion. Moreover, you must engage in a full hour of mind- numbing activity (e.g., watching back-to-back episodes of Gilligan's Island) before you work on the assignment again. This ensures that you can reconstruct what you learned from the discussion, by yourself, using your own brain. The same rule applies to nonstudents and on-line sources. 1. (6 points) Construct a Huffman code for the letters A, C, E, F, I, O, S, T, and U, whose relative frequencies are as follows: A C E F I O S T U 82 28 127 22 70 75 63 91 28 Give your Huffman code for each of the letters (which will depend on how you break ties and assign 0 and 1 to edges in the tree) and for the word FACETIOUS. What sequence of letters does your Huffman code produce when given the sequences (you only need to give the fully decoded letters): a. 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 ... b. 0 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 ... c. 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 ... d. 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 ... 2. (6 points) Morse code represents each letter of the alphabet as a sequence of one or more dots and dashes. During transmission, * A dot takes one time unit. * A dash takes three time units. * There is a one-time-unit gap between successive dots and dashes. * There is a three-time unit gap between successive letters. Thus for the twelve most common letters in English text in order of decreasing frequency, we have Letter E T A O I N S H R D L U Morse Code . - .- --- .. -. ... .... .-. -.. .-.. ..- Time to Send 1 3 5 11 3 5 5 7 7 7 9 7 Since the goal is to send messages as fast as possible, one would expect that the time to send a letter should be correlated with its frequency of occurrence. That is, if one letter occurs more frequently than another, then it should not take longer to send. However, this is not true here (compare A vs. I; or O vs. N or S). Construct a new Morse-like code for the letters E, T, A, O, I, N, S, H, R, D, L, and U that allows you to send messages containing only these letters as fast as possible (i.e., that satisfies the correlation property above), where each letter is represented by a sequence of no more than FOUR dots or dashes. Which sequences of up to THREE dots and dashes were unused? Explain how you found the codes as well as what they are. Hint: What sequences of up to 4 dots or dashes are possible? How long does each take to send? 3. (1 point) How many hours did it take you complete the first two problems? Supplementary Reading: MacCormick, Chapter 5 (error-correcting codes) CPSC-101-02/07/18