Solving the Caesar Shift cipher

Caesar Shift is one of the oldest means of encrypted communication. It is a type of substitution cipher. The technique is as follows:

Given a message: CAESAR and a key: 12, we shift each letter of the message by key mod 26 (key%26). Key values should be between 0 – 25 (0 meaning no shift). However, we use mod 26 to accommodate key values greater than 26, so that the mod function maps it back to the range 0 – 25. The shift can be positive or negative, meaning for each letter in the message, we can replace it with a letter that comes ‘key = 12’ places after it or before it respectively.

Positive Caesar shift on message: CAESAR with key: 12:

C → O

A → M

E → Q

S → E

A → M

R → D

So the encoded message is: OMQEMD

Notice that while shifting letters, whenever we reach Z, we circle around from A, as in the case of S and R above.

Similarly, negative Caesar shift would give:

C → Q

A → O

E → S

S → G

A → O

R → F

So the encoded message is: QOSGOF

Notice that a positive encryption key ‘k’ corresponds to a negative encryption key of 26 – k.

In our example above, using a positive key of 12 and a negative key of (26 – 12) = 14 would result in the same encoding: OMQEMD.

There are many variations to the Caesar cipher: In one of them, the keys vary with each letter. The variation of the keys may be represented by a function, like f(key) = (key + 1), where the key increases with each letter.

Here are a few sources on the Caesar cipher:

http://en.wikipedia.org/wiki/Caesar_cipher

http://www.simonsingh.net/The_Black_Chamber/caesar.html

http://www.cs.trincoll.edu/~crypto/historical/caesar.html

The Idea:

We have an encoded message and we do not know the key. We can find either the positive key or the negative key, as the two are related. We also need a dictionary source (a text file consisting of all words in the English language) to help in the decoding process.

The program works as follows:

→ Try all keys from 1 to 25

→ After decoding with a particular key ‘k’ check the dictionary file for the decoded word.

If it exists, it is a legitimate word. Output the word and key ‘k’ and continue.

Else continue without any output.

For large words and sentences, the procedure can be quite time-consuming. Most of the time is used by the program in dictionary lookup.

A faster solution would be to remove the dictionary lookup altogether and let the program output the decoded word for each key k from 1 – 25. The output words can then be checked by the user for legitimacy.

Source Code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

FILE *WORDLIST;
int flag = 0;

int main()
{

	int len;
	int key, i;
	char word[50];

	clock_t start, end;

	WORDLIST = fopen("dictionary.txt", "r");

	system("clear");
	printf("Enter word (in small letters):\n>> ");
	scanf("%s", word);
	len=strlen(word);
	len++;

	char *buffer = malloc(sizeof(100));

	for(key=1; key<26; key++)
	{
		unsigned char temp[len]; 	/* Unsigned is necessary else program fails when value of temp reaches 128 */
		strcpy(temp, word);

		for(i=0; i<len-1; i++)
		{
			temp[i]=word[i]+key;
			if(temp[i]>=123) 	/* If word[i] crosses the character 'z' */
			{
				temp[i]=97+temp[i]-123; /* then replace it with corresponding character from beginning of alphabet */
			}
		}

		while(fscanf(WORDLIST, "%s", buffer) != EOF)
		{
			if(strcmp(buffer, temp) == 0)
			{
				flag = 1;
				printf("Match: %s\n", buffer);
			}
		}

		rewind(WORDLIST);

	}

if(flag == 0) printf("\nNo matches found.\n");

fclose(WORDLIST);

return 0;

}

Ab Initio – A coin problem

This is a problem courtesy of CodeChef.com. The original problem can be found at http://www.codechef.com/problems/COINS. Here is the problem, put simply:

A coin of value ‘n’ can be exchanged for three coins: one of value n/2, one of value n/3 and one of value n/4. All fractions are rounded down (floor). By doing these exchanges, what is the maximum value that can be obtained?

There are many ways to solve this problem:

  • Naive Recursive/Divide and Conquer strategy:
Here are a few sources on the Recursion and the Divide and Conquer paradigm:

The Idea:

Consider n=42. Exchanging n, we get 21, 14, 10, which sum up to 45. But we can do better. A run of the source code below with n = 42, gives the maximum obtainable value as 46. This is when we breakdown 42 as follows:

42

(21, 14, 10)

( (10, 7, 5), 14, 10)

We break down 21 because 10+7+5 = 22 > 21. But we do not break down 14 and 10 because they do not yield any better total value.

Source Code:

#include <stdio.h>

int maxVal(int x);

/* This function returns the maximum total coin value obtainable from a single coin of value 'x' Say we have a coin of value 12. Then according to the problem statement, we can exchange the coin for three coins: 6, 4, 3 which have a sum of 13. So maximum value obtainable from a coin of value 12 is max{12, 13} = 13 */

int main()
{
	int n;
	printf("Enter n:\n");
	scanf("%d", &n);
	printf("Maximum value: %d\n", maxVal(n));

return 0;
}

int maxVal(int x)
{
	if(x==0)
		return 0;
	else if(x==1)
		return 1;
	else
	{
		int temp;
		temp = maxVal(x/2) + maxVal(x/3) + maxVal(x/4);		/* temp holds the total value of the coins we get in exchange for x namely x/2, x/3, x/4. This includes any higher or better amount that we might get by exchanging these coins (x/2, x/3, x/4) themselves*/

		if(x > temp) // Check for max(x, temp) and accordingly return a value
			return x;
		else
			return temp;
	}
}

Problems with this approach:

Can be extremely time-consuming for large values of n. Can take time between minutes to hours for n ~ 10^10 and above.
This is because recursion is an usually an expensive approach in problem solving when the problem size is large and when they have a large number of recurring sub-problems. This problem is even more serious in cases where the same sub-problem may need to be solved repeatedly to be able to solve larger sub-problems. The following website explains recursion in detail and also talks about when and how recursion can be expensive:

http://pages.cs.wisc.edu/~vernon/cs367/notes/6.RECURSION.html

This is where dynamic progamming comes up.

  • Dynamic Programming Approach:

The problem can be solved in much lesser time if we use Dynamic Programming. In short, Dynamic Programming(DP) is a widely used programming paradigm in which a problem is first broken down into subproblems, such that the larger problems can be related to its subproblems and solutions to these subproblems ultimately end up solving the larger problems. This is quite similar to the Divide and Conquer approach. But in DP, we use additional techniques to optimize the time complexity of our algorithms. Here are a few websites that deal with Dynamic Programming:

The Idea:

Source Code:

#include <stdio.h>

int main()
{
	int n;
	scanf("%d", &n);
	long long int i;
	long long int max_val[n+1];

	max_val[0] = 0;
	max_val[1] = 1;

	for(i=2; i<=n; i++)
	{
		if(i > max_val[i/2] + max_val[i/3] + max_val[i/4])
			max_val[i] = i;
		else
			max_val[i] = max_val[i/2] + max_val[i/3] + max_val[i/4];
	}

printf("Maximum value obtainable: %lld\n", max_val[n]);

return 0;
}