I have always loved programming - its like Lego without gravity.

Basic on my ZX81 graduating to assembler and Turbo Pascal during my teens.

Developed phone OS software - engineer, architect, product manager - but got made irrelevant by the iPhone and redundant by Android.

These days I mostly work with data, big data and fitting big data onto small boxes.

My attack on the Enigma cipher machine

imagesee also:

My favourite book was Fermat’s Last Theorem by Simon Singh.  So when in 1999 I saw his new book - The Code Book - in shops I just had to buy it, despite it being incredibly expensive (for a very poor student like myself).  I then took my new prize procession with me expectantly on the summer holiday.  And when I reached the end of the book that I discovered it contained a Cipher Challenge!

I set about cracking the ciphers with gusto.  By chance I happened to have my 286 computer with me, and my hobby language of choice - Turbo Pascal 7. 

I think I skipped the ADFGVX cipher, so excited was I at the prospect of cracking the Enigma stage 8 of the contest.  Unfortunately, when I reached the Enigma, I was stumped.  I couldn’t make any headway with simulating the Enigma.  I wrongly assumed that the message would begin with the message key encrypted with the day key twice.  But my biggest mistake was not understanding how rings affected things.  I didn’t understand how rings affected things because The Code Book didn’t actually explain them!  The book instructed the intrepid explorer to search the Internet for more details and I didn’t take any heed of this because a) I couldn’t imagine the book wouldn’t actually give me everything I needed to crack the challenge, and b) I didn’t have any Internet.  The Enigma I was trying to crack wasn’t even an accurate Enigma.

So 17 years later I find myself on a whim tacking the Enigma once again.  And my computer is a bit faster too.  But I want to attack it my way, putting myself as much as possible in the shoes of my old self 17 years ago, only this time with Internet so I can double-check the correctness and completeness of my implementation of the Engima cipher machine.

This is my attack on the Enigma.  I will describe my attack on an M3 Enigma (used by the army and air force) and then I will generalize it to attack the M4 (used by the U-boats; what Alan Turing is famous for cracking).

imageThe best way to understand how the Enigma works is to make one from paper, following instructions you find on the web.  Each time the user presses a key the key goes (via a plugboard) into the rightmost rotor.  Here is is translated to another letter and goes into the middle rotor where it gets translated again.  In then goes to the leftmost rotor where, after being translated again, it goes into a reflector.  The reflector sends it back into the leftmost rotor at another point, which translates it to another letter and sends it to the middle wheel and eventually back through the rightmost rotor and back through the plugboard to the lamp panel:

The rotors advance before each keypress, like the dials in odometer.  The rightmost rotor advances each keypress and notches on it advance the middle rotor once per full revolution.  The middle rotor also has notches on it and can knock the other wheels once each full revolution.

The Enigma has a lot of possible settings.

First, there are the combinations of 3 rotors and 1 reflector.  The M3 started with 3 rotors and two reflectors; then the Navy added 2 more, and the army followed suit, and then the navy upped it to 8.  There are 6 ways to order 3 rotors, 60 ways to pick from 5 rotors and 336 ways to pick from 8.  There were two reflectors to choose from.  So worst case 672 combinations of rotor and reflector.

Then each rotor has 26 possible starting positions (A through Z).  26 x 26 x 26 is 17,576.

Then the middle and right rotors have 26 possible ring positions, which affects when they ‘knock’ the other rotors into rotating.  (The leftmost rotor has a ring too, but has no impact on encryption when its the leftmost ring.)  There are therefore 26 x 26 = 676 ring combinations.  However, as a slight detail, not all messages are long enough to cause the middle ring to knock and therefore the impact of the rings is a function of the length of the message.  To simplify things I am going to ignore this, meaning my attack is going to blindly iterate over equivalent rotor settings on short messages.

Finally, we have the plugboard.  Early on 6 pairs of letters were swapped and later this was changed to 10 pairs.  There are 100,391,791,500 ways to order 6 pairs and 150,738,274,937,250 ways to order 10.

So the early M3 Enigmas had:

2 x 6 x 17,576 x 676 x 100,391,791,500 = 14,313,511,465,501,248,000 keys

And the later M3s had:

2 x 336 x 17,576 x 676 x 150,738,274,937,250 = 1,203,537,298,065,206,936,832,000 keys

These are very very big numbers!  The earliest M3 is providing 64-bits key and the later M3s 80-bits.  (As usual, I find slightly different numbers quoted in various texts where e.g. the number of reflectors is disregarded.  Corrections welcome!)

The World War 2 attacks on the Enigma were very very very clever and, in hindsight, the Germans didn’t understand the weaknesses nor make things as difficult for the allies as they could have.  The Germans considered the number of combinations as astronomically more than those massive numbers above because they didn’t realise that the rotor wirings were known to the allies.  The Poles correctly determined and guessed the wirings just by analyzing a manual a spy gave them.  Later, as more wheels were added, the allies set out to capture them by boarding captured boats and such.  This was despite the fact that they themselves captured the equivalent British TypeX cipher machine at Dunkirk.  Let this be a message to those relying on the obscurity of the algorithm itself! (When Admiral Dönitz commissioned the M4 Engima for his U-boat fleet, was he thinking there was a spy with access to the surface fleet day keys?)

The allies took advantage of the fact that messages were predictable and that the Enigma never ever encrypted a letter as itself.  This provided them with ‘cribs’ that dramatically cut down the search space.

However, my attack is thoroughly modern and uncunning.  I’m going to try and use computing brute force which, unavailable in World War 2, is now at my fingertips in my oldish slowish laptop!  What takes world-beating cleverness in the wartime can now be done by dumb repetition by a schoolboy-level-programmer in 2016! :)

So, if we get a computer to iterate over all the possible keys, how does it know which key is the correct one?  A human would have to know German and read each decrypt and see if it made sense.  A computer can look the decrypt up in a German dictionary and German grammar too.  Or we could short cut that and say that, if we have the key wrong, then the outcome should be seemingly random.  If the outcome isn’t random, i.e. there’s a very uneven distribution of letter frequencies, then its a good candidate decrypt for human intervention.

Having show you how big the numbers are, you can see how daunting a brute-force attack would be.  However, I’m going to disregard the plugboard.  If we disregard the plugboard then suddenly the numbers become much much smaller.

The thing about the plugboard is that it doesn’t swap all the letters, and letters it doesn’t swap have the usual frequency in the decrypt.  The swapped letters get garbled in the decrypt, but if sufficient letters go through ungarbled then the frequency counts of the letters should still be uneven enough to trigger it being treated as a candidate solution.  The job then becomes to discover a plugboard that makes sense of the garbled parts of the message.

In the Cipher Challenge, the Enigma stage 8 has the three rotors and the reflector specified.  These are actually the I, II and III rings and the B reflector.  We just don’t know the order of the rotors nor the starting positions nor the rings.

1 x 6 x 17,576 x 676 = 71,288,256

71M is a very small number for a modern computer.  My computer can, on a single core, iterate through all 71M possible rotor start positions and decrypt the message in under 10 minutes, using just one core.  I have four cores, so wall-clock time could be under 3 minutes, and this is a 2014 laptop.  (It would have been feasible - given days to weeks - to tackle this on a 286 too.  I would have optimised harder.)

To judge the randomness of the decrypt I just work out the frequency of the most common letter.  Here is the distribution for the Cipher Challenge text:

42 = 13
41 = 27
40 = 18
39 = 56
38 = 92
37 = 277
36 = 536
35 = 1132
34 = 3228
33 = 8697
32 = 21143
31 = 49767
30 = 115703
29 = 256947
28 = 550083
27 = 1124947
26 = 2194175
25 = 4036371
24 = 6904511
23 = 10651956
22 = 14097448
21 = 14880108
20 = 10951117
19 = 4615207
18 = 792354
17 = 32251
16 = 92

There are 367 characters in the message and there are 26 characters in the alphabet so the average would be 14.  Most of the decrypts are rather noisy.  Only a handful are not noisy and worthy of further consideration.   (In fact, most of these are actually duplicate results caused by the middle ring not rotating.  If we just greedily decrypt the highest candidates we happen hit the true solution immediately.)

This is the decrypt of the Cipher Challenge text, picking the rotor settings with highest maximum frequency in the decrypt, ignoring the plugboard:

VRAxKTZYIJgAwYKNJRAOIpKCOTxxAOMfQxDIMINZPOhSIStMCEJIxUEOVIEKMJgPd
ETxFSOxYIAxLnMLTdEIrOOEsDKxWchRKJbULdaHHKEJWsstehIJdIxbHVIxNIAEAc
hKVIYAIKAxIJtJIcLOxxIPxEFORIEHAxIHJAxHTrTIIZJTxzIrTPWNrGYIBJUZIEJ
AZIEJXxMEThMUXNgPSUEGTrIIVEIsxMJWNenOdIcLRITdPNOZdSFxRTDOxIRDMVgM
rxNIGJEIQxXWOYdBUxzWgrRAVPKXNgMJNIJxAchKMIAAIRKJJOLTAZIXZTwUKZxal
XxrYDDDOSOMdEIxAchPEfOMIEWJUMLNJOOGxKrIMbO

The 91 lowercase letters are actually correct and the uppercase letters garbled, its just that at this point we don’t know that.  We have to find a plugboard that turns this into German (where, as was the convention, X means space and XX means full stop).

Here are the frequencies in this decrypt:

A = 20, B = 5, C = 8, D = 16, E = 21, F = 5, G = 10
H = 13, I = 41, J = 22, K = 14, L = 8, M = 16, N = 13
O = 21, P = 9, Q = 2, R = 18, S = 10, T = 17, U = 8
V = 7, W = 9, X = 36, Y = 7, Z = 11

Now for some detective work.  Lets attack this manually by guessing.  We know that in German E is the most common letter (16%) followed by D, C and A (5%, 3%, 2% respectively).  However, this being Enigma plaintext, we expect X to be really common too.

We see in our decrypt that we have several X, several XX and no XXX, so we can guess that X is unplugged! If we replace the X with spaces, here’s what we get:

VRA KTZYIJgAwYKNJRAOIpKCOT  AOMfQ DIMINZPOhSIStMCEJI UEOVIEKMJgPd
ET FSO YIA LnMLTdEIrOOEsDK WchRKJbULdaHHKEJWsstehIJdI bHVI NIAEAc
hKVIYAIKA IJtJIcLO  IP EFORIEHA IHJA HTrTIIZJT zIrTPWNrGYIBJUZIEJ
AZIEJ  METhMU NgPSUEGTrIIVEIMJWNenOdIcLRITdPNOZdSF RTDO IRDMVgM
NIGJEIQ  WOYdBU zWgrRAVPK NgMJNIJ AchKMIAAIRKJJOLTAZI ZTwUKZ al
  rYDDDOSOMdEI AchPEfOMIEWJUMLNJOOG KrIMbO

The most common letter in our decrypt is I, so lets assume this is E.  If we set up the plugboard to swap E and I, lets see what we get:

VRA KTZAEJgAwYKN RAOEpKCOT  AOMfQ JeME ZPOhSeStMCiJUiOVeiKMJgPd
iT FSO YeA LnMLTdierOOisD  WchRKSbU daHHKiJWsstIheJde bHVNeAIAc
hKVeYAeKA eJtJecLO  eP iFOReiHA eHJA HTrTEeZJT zerTPWNrG eBJAZeiJ
A ei MiThMU TgrSUIGTrEeVIes MJWNenOdecLReTdPAAZdSF RTDO ERDMVgM
NeGJIeA UWO dBU zWgrMJVPK NgMJNeJ AchKMeAAeR JJOLTAZe ZTwUKZ al
A rYDDKOSOMdie AchrifOMeiWhUMLNJOOG KrEMbO

The 150 lowercase letters are now correct, but we still don’t know that.  We just think we are getting somewhere because we look at the frequencies E is now the most common:

A = 25, B = 5, C = 8, D = 14, E = 42, F = 5, G = 10
H = 14, I = 20, J = 21, K = 13, L = 7, M = 17, N = 10
O = 20, P = 7, Q = 1, R = 19, S = 11, T = 18, U = 8
V = 7, W = 9, X = 42, Y = 4, Z = 10

However, D is not looking so good.  We search now to see what D should be swapped with…

I searched in vain.  D isn’t swapped with anything, its just that we are attacking the noise at this point.

My next idea was that SS ought occur in the output.  So far it occurs once.  The only other double letters that occur more often are two AAs and two OOs.  I speculated that A was swapped with S, meaning to try again with O swapped with S if I ended up getting stuck later.  Here is what A swapped with S looks like:

VRKTesMJgswYrO RsOEpKCOT  sOMfQ JeME ZPOhaeAtMCiJUiOVeiKMJgPd
iT FAO Yes LnMLTdierOOiAD  ichRKabe daH KiJWAsOeheJde bHONesIsc
hKMeYseKs eJtJecLO  eP iFOReiHs eHJHerTEeZJT zerTPWerG eBJs eiJ
s ei MichMU TgraUIGTrEIes MJWNenOdecLReTdPssZdaF wTDO ERDMVge
r NeGJIes UWO deU zMgrMJVeK NgeJNeJ schKMesseR JJOLTSZerO wiKZ SK
s rYDDKOaO die schrifOzeiWhUM NJOOJ KrEibO

You can clearly see words popping out; my own eye is drawn to the “die” on the bottom line.  Other repeated words such as “EIJ” are shouting out for a J and N swap.

And so I attacked the plugboard manually, despite not knowing German. I could have researched bigram frequencies too.  I could have automated this, but I didn’t need to.  The final text fell away before my eyes:

DAS LOESUNGSWORT IST PLUTO. STUFE NEUN ENTHAELT EINE MITTEILUNG DIE MIT DES ENTKODIERT IST. ICH HABE DAS LINKSSTEHENDE BYTE DES
SCHLUESSELS ENTDECKT. ES IST EINS EINS ZERO EINS ZERO ZERO EINS EINS EINS. ICH PROGRAMMIERTE DES UND ENTDECKTE DASS DAS WORT DEB
UGGER WENN ES MIT DEM ZUGRUNDELIEGENDEN SCHLUESSEL ENTKODIERT WIRD ALS RESULTAT DIE SCHRIFTZEICHEN UNTEN ERGIBT

The rotor order was III, II then I; the wheel positions were A, F and L and the rings were 26 and 1 respectively. The plugboard was EI AS NJ KL TO and UM.

Google translate tells me that:

THE SOLUTION WORD IS PLUTO. STAGE NINE CONTAINS A MESSAGE WITH THE ENTKODIERT IS. I HAVE THE LEFT STANDING BYTE OF THE KEY DISCOVERED. IT IS ONE ONE ZERO ONE ZERO ZERO ONE ONE ONE. I PROGRAM OF AND DISCOVERED THAT THE WORD DEBUGGER IF IT WITH THE UNDERLYING KEY ENTKODIERT IS AS A RESULT THE CHARACTERS BELOW SHOWS

Now to extend the attack against the M4 used by the U-boats:

The M4 had a two ‘greek’ wheels which could be combined in one of 26 positions with two ‘thin’ reflectors to make a composite reflector.  The M4 had the full 8 normal rotors to choose between too, so the total number of combinations without the 10 pair plugboard is:

2 x 26 x 26 x 336 x 17,576 x 676 = 5,397,376,438,272

That’s actually a bit beyond where I want to be with my laptop in the summer!

However, lets look at how random messages are before they have entered the reflector.  Here are the frequencies for the Cipher Challenge text again, but counted before they enter the reflector:

36 = 156
35 = 1482
34 = 4706
33 = 11024
32 = 23400
31 = 49218
30 = 117182
29 = 252122
28 = 526396
27 = 1107028
26 = 2156674
25 = 3930160
24 = 6800664
23 = 10674482
22 = 14215630
21 = 15105272
20 = 10923458
19 = 4577118
18 = 783770
17 = 28210
16 = 104

Again a few combinations are far less random-seeming than the rest!  (In fact, the real solution is again top of the pile.)  The computer can focus on just the top candidates and crank through the 2 x 26 x 26 possible composite reflectors for each.

jump to ↓



performance
Faster searches with non-prefix fields in composite indices
Compressing MySQL databases
What highscalability.com says about Scaling my Server
Scaling my Server: follow-up
old classics
The kid's computer
Making the History of Worlds Religions map
If you defend those involved in the OpenGL ES specification, you are an idiot
Stackoverflow unwinding?
general
Why Swift?
Python annotations and type checking
pycon 2014 Sweden: the bad bits
Table-based Template Translation in C++
recreation
games programming
Perlin Noise
Perlin Noise
Drawing RTS maps fast
WillCity update
ludum-dare
Ludum Dare #35 Mosaic
LudumDare 33 wallpapers
SSIM vs MSE for Mosaics
Ludum Dare 30 results are in!