Talk:Cryptanalysis/Draft: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Howard C. Berkowitz
imported>Sandy Harris
Line 164: Line 164:


::Warning: possible tangential thought: if I were going to pick an area for a tutorial, this would be done. At least in my society, there is a great attraction to cryptology, beginning in puberty--Freud wrote about it, as I recall. I do have a few books, hopefully not in storage, directed at the beginner, such as Gaines and Sinkov. Are there good "seekrit code" articles on the web, or would our developing one improve our hit count?  Don't worry, I wouldn't discuss how a few easy technical measures could make life much more difficult for drug traffickers and terrorists, although it might be fun to recount some of Dorothy Friedman's testimony against the run-runners. [[User:Howard C. Berkowitz|Howard C. Berkowitz]] 16:35, 8 February 2010 (UTC)
::Warning: possible tangential thought: if I were going to pick an area for a tutorial, this would be done. At least in my society, there is a great attraction to cryptology, beginning in puberty--Freud wrote about it, as I recall. I do have a few books, hopefully not in storage, directed at the beginner, such as Gaines and Sinkov. Are there good "seekrit code" articles on the web, or would our developing one improve our hit count?  Don't worry, I wouldn't discuss how a few easy technical measures could make life much more difficult for drug traffickers and terrorists, although it might be fun to recount some of Dorothy Friedman's testimony against the run-runners. [[User:Howard C. Berkowitz|Howard C. Berkowitz]] 16:35, 8 February 2010 (UTC)
::: "work factor" is a measure of the effort required to break a cipher by various methods. e.g. For brute force against a block cipher with n-bit key, the average work factor is 2<sup>n-1</sup>. I think the term is from Shannon and he proves some things about it. [[User:Sandy Harris|Sandy Harris]] 17:29, 8 February 2010 (UTC)


==References==
==References==
{{reflist|2}}
{{reflist|2}}

Revision as of 12:29, 8 February 2010

This article has a Citable Version.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition The sub-field of cryptology which deals with breaking into existing codes and ciphers. [d] [e]
Checklist and Archives
 Workgroup categories Military, Computers and Mathematics [Editors asked to check categories]
 Subgroup category:  Security
 Talk Archive none  English language variant American English

Potential reorganization

I am thinking of a re-organisation here, along the lines:

  • Attacks on the system
    • Practical cryptanalysis
    • Traffic analysis
    • Side channel attacks
    • Bypassing authentication
    • Guessing secrets
      • Dictionary attacks on passwords
      • Random number weaknesses
      • Small keys
  • Attacks on the ciphers

Then the topics we currently have under "Mathematical cryptanalysis".

Things like man-in-the-middle would then turn up in two places, first under "Bypassing authentication" because if you can do that then you don't have to break the actual encryption, and second under "Attacks on the ciphers" for details of attacks on different authentication mechanisms since those details are much the same as other attacks on RSA, block ciphers or whatever. Sandy Harris 01:51, 17 October 2008 (UTC)

Should social engineering be under guessing, or its own category? For that matter, where does one put the people who write their keys on their desk calendar?
"Social engineering" and "shoulder surfing" would be categories, perhaps subheads under practical cryptanalysis. Sandy Harris 07:18, 17 October 2008 (UTC)
Side channel, I assume. covers TEMPEST/HIJACK/TEAPOT/NONSTOP, timing analysis on plaintext, acoustic cryptanalysis, Operation RAFTER (specific case of getting the received text off the intermediate frequency)
Could you define "attacks on the ciphers"?
I think this is going somewhere interesting, but not sure where it is yet.
If you would, see if we can agree on some of the more specific (e.g.,) authentication attacks in communications security. I am also open to a better name for that article. Howard C. Berkowitz 05:51, 17 October 2008 (UTC)
By "attacks on the ciphers" (chosen mainly to contrast with "attacks on the system") I meant what is now called "mathematical cryptanalysis" and might be called "cryptanalysis proper". The article introduction refers to it as "classic cryptanalyis". Not sure what the best title would be.
Somewhere up in the opening/overview part I'd want to say that, while this is a real threat, it may not be the main threat in many cases. Quote Anderson [1] about banking sytems "the threat model commonly used by cryptosystem designers was wrong: most frauds were not caused by cryptanalysis or other technical attacks, but by implementation errors and management failures." or Schneier's intro to Secrets and Lies where he says in some ways writing "Applied Cryptography" was a mistake; too much technology, not enough attention to other issues.
Side channel certainly covers Tempest and RAFTER (new to me). I'm not sure if differential fault analysis and timing attacks go there or under "attacks on the ciphers"; I lean toward the latter since they aim at finding the keys rather than just reading material. Sandy Harris 07:18, 17 October 2008 (UTC)

Orientation

Given that many readers won't have much background, I think we need a section that considers both early manual methods before they were pure guesswork (and maybe a little there), and then materials, some declassified, from the 1920s and 1930s.

Let me offer what is mostly outline, which would go before specific attacks. ...said Howard C. Berkowitz (talk) 12:04, 17 October 2008

Preparatory

While it is not necessary to be fluent in a language to cryptanalyze it, languages have different statistical properties and it helps to know the language. Of course, if there is a change of guessing probable words, knowledge of the language is more important, and knowledge of the area of application (e.g., smuggling liquor or describing radar) can reveal even more. In a real-world rather than puzzle cryptanalysis, knowledge of the circumstances in which the ciphertext was collected can be informative. William Friedman proposed four basic steps:[1]

  1. The determination of the language employed in the plain-text version.
  2. The determination of the general system of cryptography employed.
  3. The reconstruction of the specific key in the case of a cipher system, or the reconstruction, partial or complete, of the code book, in the case of a code system; or both, in the case of an enciphered code system
  4. The reconstruction or establishment of the plain text

These are usually done in the order in which they are given above and in which they usually are performed in the solution of cryptograms, although occasionally the second step may precede the first.

Basic methods

The very earliest cryptanalysis seems to have been inspired guessing about the plaintext, or perhaps attacks against a known and fairly weak manual systems. In 20th century, mathematical and statistical techniques.

The most basic is frequency analysis.[2]. When the frequency of letters, graphed by frequency against count, do not follow the curve characteristic to the language, [3] which becomes increasingly effective when keys become more complex, even in simply polyalphabetic substitution on a monographic cipher, such as the Vigenere method. Frequency analysis is almost useless against pure transposition systems, other than than confirming the system is simple transposition if the frequency of letter curve matches, and the counts of letters in cleartest match the letters in the natural language.

Monoalphabetic substitution

The most basic form, such as the Caesar cipher, uses a single key alphabet; the same cleartext letter was always represented by the same ciphertext letter. Against this, basic frequency analysis is possible. The monoalphabetic keys of increasing complexity involve:

  • Constant shift monoalphabetic solution, in which each plaintext letter is shifted a fixed number of letter positions to the right
  • Keyword-based mixed, where the unique letters of a keyword (e.g., CRYPTO) form the first letters of a key alphabet, with the rest following in alphabetical order: CRYPTOABDEFGHIJKLMQSUVW
  • Completely random: QPWOEIROTIYUSAGFHGKJLMZNXBCV
Increasing the complexity of monoalphabetics

Encryption, even of fairly simple ciphers, became more difficult when two things happened:

  • encrypted text did not follow the spacing of plaintext, so that the insight of knowing that a particular cipher symbol was at the beginning, in the middle, or of the word did not help.[4]
  • padding began being used, so that X's or perhaps meaningless digraphs were inserted, so the message was always an integral multiple of the number of cipher alphabets

Basic polyalphabetic substitutions

The simplest polyalphabetic substitution used multiple constant shift keys. For a period of 4, the number of different encryptions before the sequence of keys repeats, an example would be:

Key 1: BCDEFGHIJKLMNOPQRSTUVWXYZA
Key 2: CDEFGHIJKLMNOPQRSTUVWXYZAB
Key 3: DEFGHIJKLMNOPQRSTUVWXYZABC
Key 4: EFGHIJKLMNOPQRSTUVWXYZABCD

Even with a method that was a short-period polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, all Caesar variants (+1, +2, +3, +4), not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

Contrast the example below, of ATTACK AT TWO TODAY, with spaces suppresed. The initial encryption works by

Writing out the cleartext,

       Cleartext
       1 2 3 4
Row A  A T T A
Row B  C K A T
Row C  T W O T
Row D  O D A Y
       Result of polyalphabetic substitution (four Caesar based  used)
       1 2 3 4
Row A  B V W D
Row B  D M D X
Row C  U X R D
Row D  P F C B
Mixing in basic transposition

Even with a method that was fundamentally polyalphabetic solution, cryptographers might use different techniques for putting letters into the encryption system, and removing them. Assume there are four alphabets, 1-4, not shown for simplicity in the drawing, and the rows of ciphertext fall under the appropriate alphabet.

In the example below, taking off the letters in groups of four, from left to right, would give

      BVWD DMDX UXRD PFCB

Only a slight modification of sending all the odd rows first and then the even would give:

      BVWD UXRD DMDX PFCB

While this is a trivial example, simple frequency analysis is no longer enough to decrypt; even given the key alphabets, simple decryption yields nonsense. Still, a system simple enough to respond to hand analysis often was still a challenge, especially when the encryptor used mixed alphabets and changed them frequently

Nonperiodic key alphabets

Ciphers were not always "geometric" or strictly periodic. An aperiodic cipher might only have 4 cipheralphabets in the key, but, if the order of their use is 1-2-4-3 on the first cycle but 2-3-4-1 on the next, reconstructing the key becomes more difficult. Using a nonperiodic key alphabets, ATTACK AT TWO TODAY, becomes, with spaces suppressed:

       Result of polyalphabetic substitution
       4 3 1 2
Row A  E W U F
Row B  G O B V
Row C  W Z P V
Row D  S H B A
Adding the simple transposition

Using the row-by-row model, the ciphertext of the first would be:

EWUF GOBV WZPV SHBA

but, with even-first,

GOBV SHBA EWUF WZPV

Mathematical cryptanalysis

Mathematical cryptanalysis emerged roughly in the 1920s, principally from William Friedman and his colleagues. These still used fairly basic mathematical techniques. A significant increase in cryptanalytic power came when techniques such as group theory were applied against the Enigma machine and other early machine ciphers; these are discussed later.

Basic approaches

The first two tests mentioned both address the problem of determining the number of key alphabets used in a polyalphabetic cipher. Both were designed at a time when polyalphabetic substitution was primarily a manual process, so they work best when there are both a relatively small number of key alphabets. an algorithm in which the key alphabets are used in a straightforward repeating sequence (e.g., 12341234, not 42231431) and a relatively large amount of text encrypted in them.

All depend on the reality that there is a high redundancy in written human languages. In English, the trigraph "the" is most common, and the cryptanalyst can safely assume that the most frequent trigraph may well be those three plaintext letters, which have rotated under the same key alphabets in a suitably long volume of text.

The purpose of these methods is to calculate what is variously called the "keyword length", the "cryptoperiod", or the number of key alphabets.

Kasiski test

Best known from the work of Friedrich Kasiski in 1863, subsequent research shows it may have been independently invented by Charles Babbage in 1844, although not publicized. Formally, it is a known ciphertext attack, but supplemented by linguistic knowledge of the cleartext language.

The analyst searches through the text and finds all repetitions of at least 3. Assume that JWR repeats 15 letters after its first appearance, then 20 letters later, then 15, then 45. This does not immediately give insight.

The next step, however, is mathematical: the cryptanalyst factors the separations, and discovers the all have 5 as a prime factor. This suggests that 5 key alphabets are in use, and that THE has been encrypted by the same three in each cryptoperiod. As yet, all that can be said is that the three are consecutive: they could be, within the period, 123, 234, 345, 451, or 352. Now, there is a choice. Should the Kasiski test be repeated looking for a different repeated string, or should the analyst start reconstructing the three potential key alphabets?

Actually, it may be more appropriate to check the assumption that this is a polyalphabettic substitution with a period of 5. If this is being done manually, start at the first letter of the ciphertext, and write it all out, in columns of 5. If the assumption is correct, the frequency distribution in each column should correspond to the frequency distribution of letters in plaintext. The ciphertext, suspected to be the "E" at the end of the trigraph, should be the most frequent letter in the column.

Index of coincidence

Kappa test

New heading

I've moved in some stuff from block cipher. I think we need a general heading whose subsections are Practical cryptanalysis, side channel attacks, traffic analysis, and attacks via host security weaknesses (key loggers, etc.). Not sure what to call it, though. "Non-mathematical cryptanalysis"? Sandy Harris 04:11, 16 April 2009 (UTC)

What next?

I've done some re-organisation and added some text, but I'm now blocked. I'm not at all sure the structure I have so far is right and I am sure it is incomplete.

The material above under "Orientation: and "Mathematical cryptanalysis" is good, though it is also incomplete. Should it go into this article? Or History of cryptography? See also my comment at Talk:Cryptology#What_next.3F. Sandy Harris 15:13, 1 February 2010 (UTC)

What about things like "unicity distance", "work factor", ... There is a lot of quite solid theory that this article should either cover or link to. I do not know enough to write that well and am not sure how the structure of either this article or the whole group of related articles ought to work. But those & the stuff above definitely need to go in somewhere. Sandy Harris 16:12, 8 February 2010 (UTC)
Interesting; I know about unicity distance, but learned it in the study of error correction and only later found its cryptographic applications. In the cryptologic context, I have no idea what a "work factor" may be but would be happy to learn.
Unicity distance probably should be in its own article, as it's significant in areas such as Redundant Arrays of Inexpensive Disks and other topics such as error-correcting codes. I do have Shannon's book somewhere and it's discussed in some of my graduate texts in discrete mathematics, so I could probably take a practitioner's draft. It would need to be edited and extended by a mathematician. Actually, I should be reviewing this area myself for some hopefully paid work regarding fault tolerance in cloud computing.
Do you know if the index of coincidence has been put on a more solid foundation that William Friedman's intuition?
Warning: possible tangential thought: if I were going to pick an area for a tutorial, this would be done. At least in my society, there is a great attraction to cryptology, beginning in puberty--Freud wrote about it, as I recall. I do have a few books, hopefully not in storage, directed at the beginner, such as Gaines and Sinkov. Are there good "seekrit code" articles on the web, or would our developing one improve our hit count? Don't worry, I wouldn't discuss how a few easy technical measures could make life much more difficult for drug traffickers and terrorists, although it might be fun to recount some of Dorothy Friedman's testimony against the run-runners. Howard C. Berkowitz 16:35, 8 February 2010 (UTC)
"work factor" is a measure of the effort required to break a cipher by various methods. e.g. For brute force against a block cipher with n-bit key, the average work factor is 2n-1. I think the term is from Shannon and he proves some things about it. Sandy Harris 17:29, 8 February 2010 (UTC)

References

  1. Friedman, William F. (1938), Military Cryptanalysis, vol. I: Monoalphabetic Substitution Systems, Signal Intelligence Section, Plans and Training Division, U.S. War Department., p. 7-11
  2. Friedman-I, p. 11-17
  3. Friedman-I, pp.18-26
  4. Gaines, Helen (1939), Cryptanalysis,pp. 69-72