YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE
CPSC 467b: Cryptography and Computer Security | Handout #3 | |
Professor M. J. Fischer | January 23, 2013 | |
Problem Set 2
Due: Thursday February 14, 2013
The goal of this problem is to explore the method of brute force attack on a cryptosystem to get a better feeling for what it can and cannot do.
Our friend, Happy Hacker, has ignored my advice and decided to build his own cryptosystem, which he calls SnakeOil. He started with the botan implementation of AES-128 in CBC mode, but he decided to add his own padding and key management routines.
Happy was pleased with his scheme and told his friends about all of its features.
Unfortunately for Happy, he forgot the correct pair of key indices and so was unable to decrypt a message he received from Alice. Clever Charlie told him not to worry. His system was easily compromised by anyone with access to the key share file. A program could be written to brute force the possible key pairs, and the computer could be decrypting Alice’s message while they went out for pizza.
Sure enough, when they returned, Alice’s message was displayed on the screen.
For Happy’s problem, a brute force attack decrypts Alice’s ciphertext c for every possible key index pair (ki,kj) (0 ≤ i < j < 100) in Happy’s key file until the correct pair is found. The principal difficulty is recognizing the correct decryption once it has been found. How can one distinguish the correct decryption from the wrong one?
In general one cannot, but if some messages are more likely than others, then some decryptions of c will “look more like a valid message” than others. The attack chooses the decryption m′ of c that looks most like a valid message and guesses that it is the real message m. It also guesses that the key index pair k′ = (ki,kj) that produced it is the real pair.
The guess m′ and k′ will not always be correct. If they are, we say the attack succeeds in breaking the cryptosystem. Otherwise, the attack fails. We are interested in exploring how well this attack can be made to work in practice.
Clever’s method for choosing m′ relies on the letter frequencies of English text. Assume you have a large corpus of representative English text, so that Alice’s message is likely to have a similar letter frequency distribution. Assume further that a decryption of c using a randomly chosen incorrect index pair k′ yields a random-looking string m′ with more or less uniform distribution of the letters. If c is sufficiently long, then very likely the correct decryption m is the only one whose letter frequency distribution closely resembles that of the corpus.
For each octet (8-bit byte) b, let f(b) be the frequency of occurrence of b in a corpus of English text, and let r = ∑ bf(b) be the total size of the corpus. Then the normalized frequency p(b) = f(b)∕r is a close estimate of the probability that an arbitrary byte in a random piece of text is equal to b. Similarly, let q(b) be the normalized frequency of b in the message.
We measure the (dis)similarity of p and q by the sum of the squares of the difference at each byte of their normalized frequencies. That is, we define
We apply this measure by computing the divergence between each possible decryption of c and the corpus and choosing the decryption (and corresponding key) that minimizes the divergence.
Please solve the three problems below.
You will find a C++ implementation of SnakeOil on the Zoo in directory /c/cs467/assignments/ps2/src/. You will also find a partially-written brute force key analyzer, complete with routines for reading and normalizing frequency tables. However, it is missing two key functions: guessKey() and divergence().
All work should be submitted electronically using the script /c/cs467/bin/submit on the Zoo. The written solutions should be in the form of a PDF file. The code and data files should be archived as a .tar.gz or a .zip file and submitted. Please be aware that the submit script can only handle files, not whole directory trees.
Problem 1: Security Analysis
Write a critique of SnakeOil from both a usability and a security standpoint. What is good and bad about the cryptosystem, and what is good and bad about this particular implementation? What is the effective key length of SnakeOil? Evaluate each of Happy’s claims above for the superiority of SnakeOil over AES-128.
Do not limit yourself to answering just these particular questions. Rather, feel free to comment on all aspects of a cryptosystem that impact its usability and security in a particular environment.
Your answer should reflect what you have learned in the course so far about security in general and what it means for a cryptosystem to be secure.
Problem 2: Coding
You should write the two missing functions in the brute force key analyzer. Compile your program and test it on the file sample.enc.
This should not take more than around 50 lines of C++ code, but of course it has to work in the context of the rest of the program, so you will need to spend some time reading the rest of my code in order to see how one invokes the AES primitives in order to decrypt a file with a particular master key.
I realize that some of you might not be very familiar with C++, so please feel free to ask the TA’s or myself for help.
Problem 3: Experiments
You will find several files in the Zoo directory /c/cs467/assignments/ps2/data/. Files ending in .dat are frequency tables. Files ending in .enc are ciphertexts. The file keyshares is the key share file that was used in all of the encryptions. All ciphertexts are valid encryptions of English-language text files, but not all are easy to decipher.
You should run your brute force key finder with every frequency table and every ciphertext file. For each, report the key indices that the program found, the decryption produced, and whether or not the program succeeded in finding the correct key.
Next, you should analyze your results and try to draw conclusions about the effects of the frequency table and the length of the ciphertext on the effectiveness of the attack.
Based on the insights gained, construct a 40 character “message” that cannot be cracked using any of the furnished frequency tables. (Your message does not have to be real English text, but it must consist of printable ASCII characters.)