next up previous
Next: Improving a Substitution Cipher Up: crypto Previous: Introduction

Simple Substitution Ciphers

In a simple substitution cipher, one character is substituted for another. Here is a simple example:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
R Z B U Q K F C P Y E V L S N G W O X D J I A H T M

To encode some text, simply find each character in the text in the first line, and replace it by the character below it. For example, using the example above, if you encode the word ``BIRDBRAIN'', you get ``ZPOUZORPS''. To decode, reverse the process--for the first character in ``ZPOUZORPS'', find ``Z'' in the lower line, look above it to get ``B''--the first letter of ``BIRDBRAIN'', et cetera.

If you have to decode a lot, it is easier if you invert the line above to get the table below. With this table it is much easier to decode since the letters in the encoded word are now in alphabetical order in the top line.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
W C H T K G P X V U F M Z O R I E A N Y D L Q S J B

These simple substitution ciphers are fairly easy to ``crack''--the problem is that in English (or any language), certain letters are far more likely to appear. In English, for example, the letter ``E'' is far more likely to appear than the letter ``Z''. In fact, here is a list of the letters used in English arranged approximately in order of usage (``E'' is the most used letter; ``Z'' is least). The approximate percentages for the first few letters in the list below are: E: 12.7%, T: 9.1%, A: 8.2%, O: 7.5%, and the percentages for the last few are: J: 0.2%, Q: 0.1%, Z: 0.1%.

E T A O I N S H R D L U C M W F G Y P B V K X J Q Z

Following is a short passage encoded with a simple substitution cipher:

  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG
UJJE.
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
OFRSWN, "FNG XGCJEG XGFKVR."
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"

To try to crack this cipher, begin by counting the number of occurrences of each letter, and we obtain the following counts:

A B C D E F G H I J K L M
10 0 5 0 24 35 34 0 6 24 7 7 0
N O P Q R S T U V W X Y Z
9 18 7 9 7 22 3 11 31 16 7 5 16

Since the text sample was relatively small, we can't be certain that the most common letter (``F'' in the sample above) stands for the letter ``E'', but it's a pretty good bet that you'll find ``E'' among the letters ``F'', ``G'', and ``V''.

The structure of English gives plenty of other clues as well. For example, the word ``F'' appears twice in the text, so ``F'' must stand for ``I'' or ``A''. Since ``F'' is very common in the sample above, it is more likely to stand for ``A'', since ``A'' is much more common in English than ``I''. So the first guess you might make is that ``F'' stands for ``A''. Now the word ``FV'' appears in the text twice, and ``V'' is also very common. ``AT'' is a word in English, so perhaps ``V'' stands for ``T'' in the cipher. Now these are just guesses, but they are not bad guesses.

Making those substitutions gives us the following:

      T    A       T     A         A       AT T
  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
        A  A A T   T          A    TT       A T
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
 A  TA       A     AT           T    A        T
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
A   A T       T  A               A       AT T
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG

UJJE.
      A      T  T      A      TAT         T
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
A   T    T             T       A   T   A    A
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
 A       A              T
OFRSWN, "FNG XGCJEG XGFKVR."
     T AT A      A       T    A       A
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
T           A
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"

Looking at the text above, there are a lot more clues. For one thing, in the first line is the word ``SV'', where the ``V'' may stand for ``T''. The only two words in English ending in ``T'' are ``AT'' and ``IT'', but we've already guessed that ``F'' stands for ``A'', so ``S'' is probably ``I''. Also, since we think we know what letters stand for ``T'' and ``A'', the other extremely common letter, ``G'', probably stands for ``E''. Finally, in the next-to-last line is the word ``FAA''--a three letter word beginning with ``A''. In English, ``A'' must be ``L'', ``D'', or ``S'', but ``at all'' makes much more sense than ``at add'' or ``at ass'', so ``A'' is probably ``L'':

      T    A  E   IT I   AI      E A  I E  AT T E
  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
        A  A A T E T I    I   A  LITTE I    A T
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
 A  TA I    LA E   AT   E I EL  T E  A E    E T
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
A  EA TI  L   T  A           I L A  I E  AT T E
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG

UJJE.
      A    E T  T E E  A   E ITATI        T   I E
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
A   T E  T E      I L  TE  E   A   T   A E  A
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
 A I     A E  E   E  EA T
OFRSWN, "FNG XGCJEG XGFKVR."
     T AT ALL    AI      T    A  E    AILI
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
T          EA L   E   E   I E
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"

From here, it's easy to make progress. In the next-to-last line, ``WJV FV FAA'' is almost certainly ``NOT AT ALL'', so ``W'' is ``N'' and ``J'' is ``O''. Similarly, the word ``VZG'' is almost certainly ``THE'', so ``Z'' is ``H''. Thus we obtain:

   O OTH   A  E   IT I   AI   ON E A  I E  AT THE
  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
 OO  O  AN A A T ENT IN  HI H A  LITTE IN   A T
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
 A  TA IN   LA E   AT   E I EL  THE  A E  O ENT
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
A  EA TI  L   T  A  O    HO  I L A  I E  AT THE
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG
 OO
UJJE.
   O  A  O ENT  THE E  A  HE ITATION ON  OTH  I E
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
AN  THEN THE  HO  I L  TE  E   A   TO  A E  A
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
 A IN    A E  E O E  EA T
OFRSWN, "FNG XGCJEG XGFKVR."
   NOT AT ALL    AI   O OTH   A  E    AILIN
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
TH O  H    EA L   E O E   INE
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"

From what we have above, ``FWU'' is clearly ``AND'', so ``U'' codes for ``D'', ``ZGOSVFVSJW'' is ``HESITATION'', so ``O'' codes for ``S'', ``VZGEG'' is either ``THESE'' or ``THERE'', but ``S'' is used, so ``E'' codes for ``R'':

  DOROTH   AR ER  IT IS SAID  ON E ARRI ED AT THE
  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
DOOR O  AN A ART ENT IN  HI H A  LITTERIN   ART
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
 AS TA IN   LA E   AT  RE ISEL  THE SA E  O ENT
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
A  EA TI  L   T  A  O   SHO  IRL ARRI ED AT THE
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG
DOOR
UJJE.
   OR A  O ENT  THERE  AS HESITATION ON  OTH SIDES
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
AND THEN THE SHO  IRL STE  ED  A   TO  A E  A
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
SA IN    A E  E ORE  EA T
OFRSWN, "FNG XGCJEG XGFKVR."
   NOT AT ALL   SAID DOROTH   AR ER  SAILIN
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
THRO  H    EARLS  E ORE S INE
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"

From here, it's easy. Fill in the obvious letters for a couple of passes to obtain the final decryption:

  DOROTHY PARKER, IT IS SAID, ONCE ARRIVED AT THE
  UJEJVZR QFEYGE, SV SO OFSU, JWIG FEESTGU FV VZG
DOOR OF AN APARTMENT IN WHICH A GLITTERING PARTY
UJJE JC FW FQFEVLGWV SW PZSIZ F NASVVGESWN QFEVR
WAS TAKING PLACE.  AT PRECISELY THE SAME MOMENT,
PFO VFYSWN QAFIG.  FV QEGISOGAR VZG OFLG LJLGWV,
A BEAUTIFUL BUT VACUOUS SHOWGIRL ARRIVED AT THE
F XGFKVSCKA XKV TFIKJKO OZJPNSEA FEESTGU FV VZG
DOOR.
UJJE.
  FOR A MOMENT, THERE WAS HESITATION ON BOTH SIDES,
  CJE F LJLGWV, VZGEG PFO ZGOSVFVSJW JW XJVZ OSUGO,
AND THEN THE SHOWGIRL STEPPED BACK TO MAKE WAY,
FWU VZGW VZG OZJPNSEA OVGQQGU XFIY VJ LFYG PFR,
SAYING, "AGE BEFORE BEAUTY."
OFRSWN, "FNG XGCJEG XGFKVR."
  "NOT AT ALL!" SAID DOROTHY PARKER, SAILING
  "WJV FV FAA!" OFSU UJEJVZR QFEYGE, OFSASWN
THROUGH. "PEARLS BEFORE SWINE!"
VZEJKNZ. "QGFEAO XGCJEG OPSWG!"



Subsections
next up previous
Next: Improving a Substitution Cipher Up: crypto Previous: Introduction
Zvezdelina Stankova-Frenkel 2000-12-17