Autokey cipher


An autokey cipher is a cipher that incorporates the message into the key. The key is generated from the message in some automated fashion, sometimes by selecting certain letters from the text or, more commonly, by adding a short primer key to the front of the message.
There are two forms of autokey cipher: key-autokey and text-autokey ciphers. A key-autokey cipher uses previous members of the keystream to determine the next element in the keystream. A text-autokey uses the previous message text to determine the next element in the keystream.
In modern cryptography, self-synchronising stream ciphers are autokey ciphers.

History

This cipher was invented in 1586 by Blaise de Vigenère with a reciprocal table of ten alphabets. Vigenère's version used an agreed-upon letter of the alphabet as a primer, making the key by writing down that letter and then the rest of the message.
More popular autokeys use a tabula recta, a square with 26 copies of the alphabet, the first line starting with 'A', the next line starting with 'B' etc. Instead of a single letter, a short agreed-on keyword is used, and the key is generated by writing down the primer and then the rest of the message, as in Vigenère's version. To encrypt a plaintext, the row with the first letter of the message and the column with the first letter of the key are located. The letter in which the row and the column cross is the ciphertext letter.

Method

The autokey cipher, as used by members of the American Cryptogram Association, starts with a relatively-short keyword, the primer, and appends the message to it. If, for example, the keyword is "QUEENLY" and the message is "ATTACK AT DAWN", the key would be "QUEENLYATTACKATDAWN".
Plaintext: ATTACK AT DAWN...
Key: QUEENL YA TTACK AT DAWN....
Ciphertext: QNXEPV YT WTWP...
The ciphertext message would thus be "QNXEPVYTWTWP".
To decrypt the message, the recipient would start by writing down the agreed-on key again.
QUEENLY
The first letter of the key, Q, would then be taken, and that row would be found in a tabula recta. That column for the first letter of the ciphertext would be looked across, also Q in this case, and the letter to the top would be retrieved, A. Now, that letter would be added to the end of the key:
QUEENLYA
Then, since the next letter in the key is U and the next letter in the ciphertext is N, the U row is looked across to find the N to retrieve T:
QUEENLYAT
That continues until the entire key is reconstructed, when the primer can be removed from the start.

Cryptanalysis

Autokey ciphers are somewhat more secure than polyalphabetic ciphers that use fixed keys since the key does not repeat within a single message. Therefore, methods like the Kasiski examination or index of coincidence analysis will not work on the ciphertext, unlike for similar ciphers that use a single repeated key.
A key weakness of the system, however, is that the plaintext is part of the key. That means that the key will likely contain common words at various points. The key can be attacked by using a dictionary of common words, bigrams, trigrams etc. and by attempting the decryption of the message by moving that word through the key until potentially-readable text appears.
Consider an example message "MEET AT THE FOUNTAIN" encrypted with the primer keyword "KILT": To start, the autokey would be constructed by placing the primer at the front of the message:
plaintext: MEETATTHEFOUNTAIN
primer: KILT
autokey: KILTMEETATTHEFOUN
The message is then encrypted by using the key and the substitution alphabets, here a tabula recta:
plaintext: MEETATTHEFOUNTAIN
key: KILTMEETATTHEFOUN
ciphertext: WMPMMXXAEYHBRYOCA
The attacker receives only the ciphertext and can attack the text by selecting a word that is likely to appear in the plaintext. In this example, the attacker selects the word "THE" as a potential part of the original message and then attempts to decode it by placing THE at every possible location in the ciphertext:
ciphertext: WMP MMX XAE YHB RYO CA
key: THE THE THE THE THE..
plaintext: DFL TFT ETA FAX YRK..
ciphertext: W MPM MXX AEY HBR YOC A
key: . THE THE THE THE THE.
plaintext: . TII TQT HXU OUN FHY.
ciphertext: WM PMM XXA EYH BRY OCA
key: .. THE THE THE THE THE
plaintext: .. WFI EQW LRD IKU VVW
In each case, the resulting plaintext appears almost random because the key is not aligned for most of the ciphertext. However, examining the results can suggest locations of the key being properly aligned. In those cases, the resulting decrypted text is potentially part of a word. In this example, it is highly unlikely that "DFL" is part of the original plaintext and so it is highly unlikely either that the first three letters of the key are THE. Examining the results, a number of fragments that are possibly words can be seen and others can be eliminated. Then, the plaintext fragments can be sorted in their order of likelihood:
unlikely <------------------> promising
EQW DFL TFT............ ETA OUN FAX
A correct plaintext fragment is also going to appear in the key, shifted right by the length of the keyword. Similarly, the guessed key fragment also appears in the plaintext shifted left. Thus, by guessing keyword lengths, more plaintext and key can be revealed.
Trying that with "OUN", possibly after wasting some time with the others, results in the following:
shift by 4:
ciphertext: WMPMMXXAEYHBRYOCA
key: ......ETA.THE.OUN
plaintext: ......THE.OUN.AIN
by 5:
ciphertext: WMPMMXXAEYHBRYOCA
key: .....EQW..THE..OU
plaintext: .....THE..OUN..OG
by 6:
ciphertext: WMPMMXXAEYHBRYOCA
key: ....TQT...THE...O
plaintext: ....THE...OUN...M
A shift of 4 can be seen to look good and so the revealed "ETA" can be shifted back by 4 into the plaintext:
ciphertext: WMPMMXXAEYHBRYOCA
key: ..LTM.ETA.THE.OUN
plaintext: ..ETA.THE.OUN.AIN
A lot can be worked with now. The keyword is probably 4 characters long, and some of the message is visible:
M.ETA.THE.OUN.AIN
Because the plaintext guesses have an effect on the key 4 characters to the left, feedback on correct and incorrect guesses is given. The gaps can quickly be filled in:
MEETATTHEFOUNTAIN
The ease of cryptanalysis is caused by the feedback from the relationship between plaintext and key. A three-character guess reveals six more characters, which then reveal further characters, creating a cascade effect. That allows incorrect guesses to be ruled out quickly.