Vigenère cipher
The Vigenère cipher is a method of encrypting alphabetic text by using a simple form of polyalphabetic substitution. It employs a keyword, where each letter of the keyword corresponds to a shift in the alphabet for the plaintext letter. To encrypt a message, the plaintext is aligned with the repeated keyword, and each letter is shifted according to the position of the corresponding keyword letter in the alphabet (similar to the Caesar cipher but with a different key for each character). For example, if the keyword letter is "B", the plaintext letter is shifted by one position; if it's "C", the shift is by two positions, and so on. The resulting ciphertext is a combination of these shifted letters.
| Plaintext letter | Keyword letter | Ciphertext letter |
| I (8) | B (1) | J (9) |
| L (11) | E (4) | P (15) |
| O (14) | A (0) | O (14) |
| V (21) | R (17) | M (12) |
| E (4) | B (1) | F (5) |
| C (2) | E (4) | G (6) |
| O (14) | A (0) | O (14) |
| D (3) | R (17) | U (20) |
| I (8) | B (1) | J (9) |
| N (13) | E (4) | R (17) |
| G (6) | A (0) | G (6) |
The total number of possible keys in the Vigenère cipher is calculated as 26 raised to the power of the keyword length (n), resulting in 26n possible combinations. For a 12-letter random keyword, the total number of possible combinations is 2612, which equals approximately 95 quadrillion combinations. In contrast, if the keyword is a real word, the number of combinations is limited to the number of valid words in the language, typically around 1 million, significantly reducing its security. This vulnerability is amplified by the risk of dictionary attacks, where attackers can systematically test common words or phrases, making it easier to guess the keyword and compromise the encryption.
CHARACTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def translate_message(key, message, mode):
translated = [] # list used to collect processed characters before forming the final string
key = key.upper()
key_length = len(key)
key_index = 0 # tracking which character of the key should be applied to the current symbol
for symbol in message:
if symbol.upper() in CHARACTERS:
num = CHARACTERS.find(symbol.upper()) # converting the message character to its alphabet index
shift = CHARACTERS.find(key[key_index]) # determining how far to shift based on the current key character
if mode == "encrypt":
num = (num + shift) % len(CHARACTERS) # forward shift with wrap-around to stay inside alphabet bounds
elif mode == "decrypt":
num = (num - shift) % len(CHARACTERS) # reverse shift using modular arithmetic to avoid negative indices
translated_symbol = CHARACTERS[num] # converting the shifted index back into a character
if symbol.islower():
translated_symbol = translated_symbol.lower() # restoring original case so output matches input formatting
translated.append(translated_symbol)
key_index = (key_index + 1) % key_length # advancing through the key and loop back to the start when needed
else:
translated.append(symbol) # characters outside the alphabet (spaces, punctuation) are copied unchanged
return "".join(translated)
message = """In cryptography, the Vigenère cipher is a method of encrypting alphabetic text by using a simple form of polyalphabetic substitution.
A Vigenère cipher uses a keyword, where each letter of the keyword refers to a shift in the alphabet.
This method is more secure than a simple Caesar cipher, as it uses multiple shifts based on the keyword."""
translated = translate_message("KEYWORD", message, "encrypt")
print(translated)
Cracking the Vigenère Cipher
Frequency analysis
Frequency analysis is a powerful technique used to break the Vigenère cipher by leveraging predictable patterns in letter usage. In English, certain letters—such as E, T, A, O, I, N - appear more frequently, while others like Q, Z, X are less common. The cracking process begins by isolating segments of the ciphertext based on assumed key lengths. Each segment is treated as if it were encrypted using a Caesar cipher, and its letter frequency is compared against the standard English frequency order (ETAOINSHRDLCUMWFGYPBVKJXQZ). By scoring how closely each segment matches expected English patterns, the algorithm identifies the most likely key characters. These candidates are then combined and tested to find a key that produces readable English text. This method dramatically reduces the number of possible keys and helps reveal the original message.
Kasiski examination
Kasiski examination is a classical cryptanalysis method used to estimate the key length of a Vigenère cipher. It works by identifying repeated sequences of letters in the ciphertext and measuring the spacing between their occurrences. These spacings often share common factors that correspond to the actual key length. The algorithm extracts repeated sequences of 3 to 5 letters and calculates the distances between their appearances. It then analyzes the factors of these distances to determine which key lengths are most probable. This narrows down the search space and improves the efficiency of frequency analysis.
Brute-force fallback
If frequency analysis and Kasiski examination fail to produce a valid decryption, the algorithm falls back to brute-force testing. It systematically tries all key lengths up to a defined maximum and evaluates each possible key combination using English language detection. Though computationally intensive, this ensures that even obscure or irregular keys can be cracked.
import itertools, re
from vigenère_cipher import translate_message
from is_english import *
ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
ENGLISH_FREQ_ORDER = "ETAOINSHRDLCUMWFGYPBVKJXQZ" # ranked by typical English frequency
MAX_KEY_LENGTH = 16
TOP_CANDIDATES = 4
NON_LETTER_REGEX = re.compile(r"[^A-Z]") # used to strip punctuation/spaces for analysis
### Frequency analysis ###
def count_letters(text):
counts = {char: 0 for char in ALPHABET} # initializing frequency table
for char in text.upper():
if char in ALPHABET:
counts[char] += 1 # counting only A–Z characters
return counts
def sort_by_frequency(text):
freq_map = count_letters(text)
freq_to_letters = {} # grouping letters by how often they appear
for letter, freq in freq_map.items():
freq_to_letters.setdefault(freq, []).append(letter)
for freq in freq_to_letters:
freq_to_letters[freq].sort(key = ENGLISH_FREQ_ORDER.find, reverse = True) # breaking ties using English frequency
sorted_freqs = sorted(freq_to_letters.items(), key = lambda x: x[0], reverse = True) # highest frequency first
return "".join("".join(pair[1]) for pair in sorted_freqs) # flattening into a ranked string
def match_english_profile(text):
freq_order = sort_by_frequency(text) # getting ranked letters from ciphertext segment
score = sum(1 for c in ENGLISH_FREQ_ORDER[:6] if c in freq_order[:6]) # comparing most common letters
score += sum(1 for c in ENGLISH_FREQ_ORDER[-6:] if c in freq_order[-6:]) # comparing rare letters
return score # higher score = more English-like
### Kasiski examination ###
def find_repeated_sequences(text):
cleaned = NON_LETTER_REGEX.sub("", text.upper()) # stripping non‑letters so repeated patterns align cleanly
spacings = {}
for length in range(3, 6): # typical repeated substring lengths for Kasiski
for i in range(len(cleaned) - length):
seq = cleaned[i:i + length] # candidate repeated sequence
for j in range(i + length, len(cleaned) - length):
if cleaned[j:j + length] == seq:
spacings.setdefault(seq, []).append(j - i) # record distance between occurrences
return spacings
def get_factors(n):
return [i for i in range(2, MAX_KEY_LENGTH + 1) if n % i == 0] + \
[n // i for i in range(2, MAX_KEY_LENGTH + 1)
if n % i == 0 and n // i <= MAX_KEY_LENGTH and n // i != 1] # including complementary divisors that fit within key length
def tally_factors(spacing_dict):
factor_counts = {} # counting how often each factor appears across all spacings
for spacings in spacing_dict.values():
for spacing in spacings:
for factor in get_factors(spacing):
factor_counts[factor] = factor_counts.get(factor, 0) + 1 # incrementing count for each valid factor
return sorted(factor_counts.items(), key = lambda x: x[1], reverse = True) # most frequent factors first
def kasiski_analysis(ciphertext):
repeats = find_repeated_sequences(ciphertext) # locating repeated substrings and their spacings
factor_data = tally_factors(repeats) # determining which key lengths appear most plausible
return [factor for factor, _ in factor_data] # returning ranked list of likely key lengths
### Decryption attempt ###
def extract_nth_letters(n, key_len, text):
cleaned = NON_LETTER_REGEX.sub("", text.upper()) # removing punctuation so indexing aligns with key positions
return "".join(cleaned[i] for i in range(n - 1, len(cleaned), key_len)) # taking every n-th letter for frequency testing
# Trying to decrypt the ciphertext using a given key length and frequency analysis
def try_key_length(ciphertext, key_len):
candidates = [] # storing top-scoring shift guesses for each key position
for i in range(1, key_len + 1):
segment = extract_nth_letters(i, key_len, ciphertext) # letters encrypted using the same key character
scores = sorted(
[(char, match_english_profile(translate_message(char, segment, "decrypt")))
for char in ALPHABET], # testing all 26 possible shifts
key = lambda x: x[1],
reverse = True # highest English-likeness first
)
candidates.append(scores[:TOP_CANDIDATES]) # keeping only the best few shifts
print(f"Top candidates for key position {i}: {' '.join(c[0] for c in scores[:TOP_CANDIDATES])}")
for combo in itertools.product(range(TOP_CANDIDATES), repeat = key_len): # brute-force combinations of top candidates
key = "".join(candidates[i][combo[i]][0] for i in range(key_len)) # building a candidate key
decrypted = translate_message(key, ciphertext.upper(), "decrypt") # attempting full decryption
if is_english(decrypted): # checking if output resembles English text
restored = "".join(
decrypted[i].upper() if ciphertext[i].isupper() else decrypted[i].lower() # restoring original capitalization pattern
for i in range(len(decrypted))
)
print(f"\nPossible key found: {key}")
print(restored)
if input("Enter Q to quit, anything else to continue: ").strip().upper().startswith("Q"):
return restored
return None
def crack_vigenère(ciphertext):
likely_lengths = kasiski_analysis(ciphertext) # starting with statistically likely key lengths
for length in likely_lengths:
result = try_key_length(ciphertext, length)
if result:
return result
print("No match found with likely lengths. Trying all possibilities...")
for length in range(1, MAX_KEY_LENGTH + 1):
if length not in likely_lengths: # avoiding re-testing lengths already tried
print(f"Trying key length {length} ({TOP_CANDIDATES ** length} combinations)...")
result = try_key_length(ciphertext, length)
if result:
return result
return None
ciphertext = """Sr anmgwykpwdyb, dlc Rwxhxèvc ywgkov go o dhdlmz cw hxgpudklxk yhdydlireq khhx zu ijlxk y owdsvi dkfd rp tmhmrozlyxsklm wsxgkldyrece.
D Fmeabèih mmndsi xciq w yvbgspz, kyhbi cwqy ooxraf fi dlc gspzyvb nswhbw rk o jksjr eb kko ejlvreox.
Rdwj poxfkr zv wspa gvfevc pvrq k wgidch Mecooi fstfaf, rv sx sosj pepredch clgbhj ekwcz ce wri iamnrbh."""
result = crack_vigenère(ciphertext)
print("\nFinal result:" if result else "\nFailed to crack the cipher.")
if result:
print(result)