A symmetric cryptosystem consists of a set M of
plaintext messages, a set C of ciphertexts, a set
K of keys, and two functions, an encryption function E: K ×M → C and a decryption function
D: K ×C → M. We often write Ek(m) = E(k,m) and Dk(c) = D(k, c) and may talk about a cryptosystem
consisting of a family of encryption and decryption functions indexed
by k, but this should not obscure the fact that in order to be
useful, it must be feasible to evaluate the functions given an
arbitrary key k.
We require that Dk be the (one-sided) inverse
of Ek, that is, for all keys k ∈ K and all plaintext
messages m ∈ M, we have Dk(Ek(m)) = m.
To be useful, it should certainly be difficult to find k given only
c = Ek(m). But we generally want much more. The goal is to hide
m, not k, so a procedure that found m given c would compromise
the cryptosystem even if k remained well hidden. Thus, we want the
system to resist a ciphertext only attack: the attacker knows c
and tries to find m.
In practice, this also might not be enough. The attacker might know a
number of pairs plaintext-ciphertext pairs (m1, c1), …, (mr,cr) where ci = Ek(mi) for i = 1, …, r, in addition to
knowing c. We want it to be difficult for the attacker to find m
in this case also. Thus, we want the system to resist a known
plaintext attack.
In many contexts, one needs even stronger forms of attack resistance,
where the attacker not only knows a number of plaintext-ciphertext
pairs but also gets to choose the plaintext or the cipher text, or
both. Where the attacker gets to experiment with the encryption
system, it should still be difficult to find the plaintext
corresponding to any ciphertext c that differs from those in the
known plaintext-ciphertext pairs. These so-called chosen
plaintext, chosen ciphertext, and chosen text attacks
are much stronger than the simple ciphertext only attack and much
harder to defend against. For example, in a chosen ciphertext attack,
the attacker would be allowed to obtain the decryption of c+1 when
trying to decrypt c.
The Caeser cipher is said to go back to Roman times. Let the message
space M and ciphertext space C consist of strings of
uppercase letters from the Roman alphabet and spaces. Number the
letters of the alphabet starting at 0, so A = 0, B = 1, ..., Z = 25. The key space K = {0, …, 25}. To encrypt, Ek
replaces each letter m by the letter (m+k) mod 26.1 To decrypt,
Dk replaces each letter c by the letter (c−k) mod 26.
Consider the problem of breaking the Caeser cipher. Suppose you
intercept the ciphertext JXQ. You quickly discover that
E3(GUN) = JXQ. But can you conclude that k=3 and
GUN is the corresponding plaintext? Upon further investigation, you
discover that E23(MAT) = JXQ. So now you are in a
quandary. You have two plausible decryptions and no way to tell for
sure which is correct. Have you broken the system or haven't you?
According to the "definition" above, you haven't since you haven't
found the plaintext with certainty. But you've certainly inflicted
serious damage. You've reduced the possible set of 3-letter words
that the message might have been down to a small set of possibilities.
In many contexts, this is nearly as good as having completely broken
the system. This suggests that a formal definition of security must
take into account the possibility of "partially breaking" a system,
as we did here.
The longer the message, the more likely that only one key will lead to
a sensible message. For example, if the ciphertext were "EXB JXQ",
you would know that k=3, since D3(EXB JXQ) = BUY
GUN, whereas D23(EXB JXQ) = HAE MAT is nonsense.
At the opposite extreme, consider the Caeser cipher with a one-letter
message. If I tell you that the ciphertext is V, you learn absolutely
nothing about the plaintext. You reason as follows: If k=0, then
m=V. If k=1, then m=U. If k=2, then
m=T. And so on through the alphabet. For every possible
plaintext message m, there is exactly one key for which m encrypts
to V.2 So
knowing that c = V doesn't give the attacker any information
about the plaintext m.
We say such a system is information-theoretically secure. This
means that there is no statistical correlation between the plaintext
and the ciphertext. To make this definition precise, one must assume
that the key k is chosen uniformly at random and is unknown to the
attacker. What the attacker sees is a random variable c = Ek(m),
where m is a constant and k a uniformly distributed random
variable. c is identically distributed for each choice of m;
hence, the distribution of c does not depend on m, so knowing c
gives no information about m.
Thus, we see that the same Caeser cipher can be
information-theoretically secure, or partially secure, or completely
breakable, depending on the length of the message to which it is
applied.
A code can be hard for Eve to break for two different reasons: (i) She
may not have enough information to uniquely determine the plaintext,
or (ii) she may have enough information, but it takes her too much
time to find the plaintext. In the first case, we say the code is
information-theoretically secure. In the second, it is
computationally secure. The first notion is based on statistical
inference; the second on computational complexity.
Statistics is about guessing underlying parameter values from samples
drawn from a random distribution. For example, given a biased coin
with probability p of heads, one wants to determine p by observing
the outcomes of several coin flips. Of course, one can never
determine p exactly, so one tries instead to guess a value p′ that
is "likely" to be "close" to p.
Eve's job in attempting to break a cryptosystem is analogous to that
of the statistician. The analog of a coin flip is for Alice to choose
a random key k and compute c = Ek(m). Eve tries to guess a value
m′ that is "likely" to be "close" to m. Unlike the coin-flip
example, Eve generally will only get one sample for a given message m.
Neverless, the same notions apply, and the closer Eve can get to finding
the true message m′, and the more likely she is to be correct, the
weaker our cryptosystem is.
The strongest case of security is where Eve can get no information at
all about m. This case obtains when c is completely independent
of m, that is, when the distribution of c does not depend on m.
Recall that a (discrete) probability distribution is a function that
assigns a probability p(x) to each point x in the sample space.
In our case, each value of m results in a potentially different
probability distribution pm(c), where pm(c) is the probability
of choosing a key for which Ek(m) = c. To say that the
distribution of c does not depend on m is the same as saying that
all of the functions pm() are the same.3
For the Caeser cipher with one-letter messages, we have pm(c) = 1/26 for each possible ciphertext letter c. This is because there
are 26 equiprobable keys, and each key maps m to a different letter.
In this case, c is independent of m, and therefore knowledge of
c gives no information about m. We say the system is
information-theoretically secure.
For the Caeser cipher with two-letter messages, we lose this
independence. For example, let c = VV. Then pAA(VV) = 1/26, but pBC(VV) = 0. As functions, pAA() ≠ pBC(), and Eve can get information about m from
observing c. In particular, Eve knows that m is one of the
messages in the set {AA, BB, …ZZ}.
A one-time pad is a simple information-theoretically secure
cryptosystem based on the bitwise exclusive-or operator (XOR),
which we write as ⊕. Recall for Boolean values x and y
that x ⊕y is true when exactly one of x and y is true, but
is false if x and y are either both false or both true. Exclusive-or
is therefore equivalent to sum modulo two, when true is represented by 1
and false by 0. In symbols,
x ⊕y = (x + y) mod 2.
With a one-time pad, the plaintext, ciphertext, and key are all
assumed to be binary strings of the same length. The encryption
function is Ek(m) = k ⊕m, where ⊕ is applied
componentwise to corresponding bits of k and m. It's a basic fact
of mod 2 addition that addition and subtraction are the same.
This implies that Ek is its own inverse; hence Dk(c) = Ek(c) is
the decryption function.
Dk(Ek(m)) = k ⊕( k ⊕m) = (k ⊕k) ⊕m = 0 ⊕m = m.
Like the one-letter Caeser cipher, the simple XOR cryptosystem has the
property that for every ciphertext string c and every plaintext m,
there is exactly one key k such that Ek(m) = c (namely, m ⊕c). Hence, every ciphertext is equally likely no matter what m is,
so the one-time pad is information-theoretically secure.
The simple XOR cryptosystem would seem to be the perfect cryptosystem.
It works for messages of any length (by choosing a key of the same
length). It is simple, easy to encrypt and decrypt, and
information-theoretically secure. In fact, it is sometimes used for
highly sensitive data. However, it suffers from two major weaknesses
that make it not so useful in practice:
The key k must be as long as the message to be encrypted. Key
management and key distribution are both difficult problems which we
will be discussing later in the course. The longer the keys, the
more difficult these problems become.
The same key must never be used more than once. (Hence the term
"one-time".) The reason is that the simple XOR cryptosystem
immediately succumbs to a known plaintext attack. If Eve knows just
one plaintext-ciphertext pair (m1, c1), then she can easily
compute k = m1 ⊕c1, totally breaking the system and
allowing her to decrypt all future messages sent with that key.
Even in a ciphertext-only situation, if Eve has two ciphertexts
c1 and c2 encrypted by the same key k, she can gain
significant partial information about the corresponding plaintexts
m1 and m2. In particular, she can compute m1 ⊕m2,
since
m1 ⊕m2 = (c1 ⊕k) ⊕(c2 ⊕k) = c1 ⊕c2.
That information, together with other information she might have about
the likely content of the messages, may be enough for her to seriously
compromise the secrecy of the data.
For any fixed message m, the number of possible ciphertexts is at
most the size of the key space. (It will be even less if two or more
keys map m to the same ciphertext, which is perfectly possible.)
Often the message space and the ciphertext space are the same, in
which case Ek(m) is a bijection for each fixed value of k. From
these two facts, it follows that for each fixed ciphertext c, the
number of possible plaintext messages is at most |K|. Hence,
knowing the ciphertext c immediately reduces the uncertainty about
m from being a member of a set of size |M| to a member of a
set of size |K|. The key spaces of most codes used in practice
are much smaller than the space of possible messages and ciphertext.
Hence, information-theoretically, Eve gains a lot of information about
m just knowing c. Moreover, if the set of meaningful messages is
sparse in M and not correlated with the structure of Ek (the
usual case), then it highly likely that the only meaningful message that
maps to c is Alice's original message m.
A brute force attack is one that tries all possible keys k.
For each k, Eve computes mk = Dk(c) and tests if mk is
meaningful. If so, then mk is a plausible decryption of c. If
exactly one mk is found that is meaningful, then Eve knows that
mk = m.
Given long enough messages, the Caeser cipher is easily broken by
brute force-one simply tries all 26 possible keys and sees which
leads to a sensible plaintext. The Caeser cipher succumbs because
the key space is so small.
With modern computers, it is quite feasible for an attacker to try
millions ( ∼ 220) or billions ( ∼ 230) of keys. Of
course, the attacker also needs some automated test to determine when
she has a likely candidate for the real key, but such tests are often
easy to produce given a little knowledge of the probable message
space.
How big is big enough? The DES (Data Encryption Standard)
cryptosystem (which we will talk about shortly) has only 56-bit keys
for a key space of size 256. A special DES Key Search Machine
was built as a collaborative project by Cryptography Research,
Advanced Wireless Technologies, and EFF. (See
http://www.cryptography.com/resources/whitepapers/DES.html for
details.) This machine was capable of searching 90 billion
keys/second and discovered the RSA DES Challenge key on July 15, 1998,
after searching for 56 hours. The entire project cost was under
$250,000.
Today, 80-bit keys are probably still safe, but only barely so.
280 is only about 16 million times as large as the DES key space,
and tens of thousand of machine on the internet could conceivably be
deployed in breaking a code. If not quite feasible using today's PC's
(and I'm not saying it isn't), it's not that far-fetched to imagine
searching an 80-bit key space in the foreseeable future. Much better
are systems like triple DES (with 112-bit keys) and AES (with 128-bit
keys), which will probably always be safe from brute-force attacks (but
not necessarily from other kinds of attacks).
Footnotes:
1x mod 26 is the remainder of x divided by 26.
2Why does it matter that there is exactly one?
3One often assumes that
the parameters themselves are also drawn from some distribution. This enables
one to talk, for example, about the probability that the ciphertext = "QBA".
But it seems somewhat counterintuitive to assume that Alice merely sends
random message according to some fixed (but unknown) probability distribution.
Therefore, we prefer to avoid such assumptions whenever possible.
File translated from
TEX
by
TTH,
version 3.66. On 18 Sep 2005, 17:46.