Basics I – Searching

Search Algorithms:

Given a database, the first thing you would want to do is probably search through its entries to see if a particular entry exists. This is why search algorithms are one of the most basic and widely used algorithms, and one of the first algorithms taught in any class.

Linear Search:

The Facebook user database has nearly 800,000,000 entries. To search for an user, a linear search algorithm would check each entry from the beginning to the end to see if there is a match. In the worst case, if the entry we are searching for is at the end of the list, we might have to go through all 800,000,000 entries. Even if checking each entry takes only .000001 secs on an average, total time we would need to search for the entry in the worst case is: 800,000,000 * .000001 = 800 secs or 13.33 mins. Imagine waiting for 13 minutes just to check if your friend is on Facebook!

The advantage of linear search apart from its simplicity is that it can be used irrespective of whether the database is sorted or not.

The disadvantage is obviously that it is extremely time-consuming when we are working with large databases.

Binary Search:

Binary Search drastically reduces the worst-case time required to search through a database. However, it can be used only when the database is sorted i.e. when the entries are arranged either in ascending or descending order.

The Binary Search algorithm:

bool BinarySearch(A, left, right, x):

// A is the array(database) of numbers

// x is the number to search for

// left is the leftmost index(starting index) of the working array

// right is the rightmost index(ending index) of the working array

if (left == right):

if (A[left] == x):

return TRUE;

else:

return FALSE;

else:

int mid = floor[(left+right)/2];

if (A[mid] == x):

return TRUE;

else if (A[mid] < x):

BinarySearch(A, mid+1, n, x);

else if (A[mid] > x):

BinarySearch(A, 1, mid-1, x);

The key to understanding the algorithm is that, since the array is sorted in ascending order, if the middle value is less than x, then x(if it exists) will obviously be among the values to the right of the middle value in the array. And if the middle value is greater than x, then x(if it exists) will be among the values to the left of the middle value. Accordingly, the algorithm recurses on the left half of the array A[1…mid-1] or the right half A[mid+1…n].

The if(left == right)  part checks to see if we have boiled down to only one element. In that case, if that element equals ‘x’ then the algorithm returns TRUE, else we can be sure that ‘x’ does not exist in the array(database) and the algorithm returns FALSE.

Example:

Consider the small database of numbers:

12 43 9 32 89 2 6 22 11 (no. of elements, n = 9 )

Sorting the database using an appropriate sorting algorithm (see <insert link> for Sorting Algorithms),

2 6 9 11 12 22 32 43 89 ( n = 9 )

The algorithm then works as shown in the image below:


Binary Search Example

Time Complexity:

As mentioned previously, Binary Search works a lot faster than the conventional linear search. In fact if we look at the Facebook user database example as discussed previously, Binary search on such a database will require comparisons of the order of only log(800,000,000) ~ 30. So if checking each entry takes .000001 secs as before, time required using Binary Search would be 30 * .000001 = .00003 secs.

Recommended Links:
Advertisements

Ab Initio – A coin problem

This is a problem courtesy of CodeChef.com. The original problem can be found at http://www.codechef.com/problems/COINS. Here is the problem, put simply:

A coin of value ‘n’ can be exchanged for three coins: one of value n/2, one of value n/3 and one of value n/4. All fractions are rounded down (floor). By doing these exchanges, what is the maximum value that can be obtained?

There are many ways to solve this problem:

  • Naive Recursive/Divide and Conquer strategy:
Here are a few sources on the Recursion and the Divide and Conquer paradigm:

The Idea:

Consider n=42. Exchanging n, we get 21, 14, 10, which sum up to 45. But we can do better. A run of the source code below with n = 42, gives the maximum obtainable value as 46. This is when we breakdown 42 as follows:

42

(21, 14, 10)

( (10, 7, 5), 14, 10)

We break down 21 because 10+7+5 = 22 > 21. But we do not break down 14 and 10 because they do not yield any better total value.

Source Code:

#include <stdio.h>

int maxVal(int x);

/* This function returns the maximum total coin value obtainable from a single coin of value 'x' Say we have a coin of value 12. Then according to the problem statement, we can exchange the coin for three coins: 6, 4, 3 which have a sum of 13. So maximum value obtainable from a coin of value 12 is max{12, 13} = 13 */

int main()
{
	int n;
	printf("Enter n:\n");
	scanf("%d", &n);
	printf("Maximum value: %d\n", maxVal(n));

return 0;
}

int maxVal(int x)
{
	if(x==0)
		return 0;
	else if(x==1)
		return 1;
	else
	{
		int temp;
		temp = maxVal(x/2) + maxVal(x/3) + maxVal(x/4);		/* temp holds the total value of the coins we get in exchange for x namely x/2, x/3, x/4. This includes any higher or better amount that we might get by exchanging these coins (x/2, x/3, x/4) themselves*/

		if(x > temp) // Check for max(x, temp) and accordingly return a value
			return x;
		else
			return temp;
	}
}

Problems with this approach:

Can be extremely time-consuming for large values of n. Can take time between minutes to hours for n ~ 10^10 and above.
This is because recursion is an usually an expensive approach in problem solving when the problem size is large and when they have a large number of recurring sub-problems. This problem is even more serious in cases where the same sub-problem may need to be solved repeatedly to be able to solve larger sub-problems. The following website explains recursion in detail and also talks about when and how recursion can be expensive:

http://pages.cs.wisc.edu/~vernon/cs367/notes/6.RECURSION.html

This is where dynamic progamming comes up.

  • Dynamic Programming Approach:

The problem can be solved in much lesser time if we use Dynamic Programming. In short, Dynamic Programming(DP) is a widely used programming paradigm in which a problem is first broken down into subproblems, such that the larger problems can be related to its subproblems and solutions to these subproblems ultimately end up solving the larger problems. This is quite similar to the Divide and Conquer approach. But in DP, we use additional techniques to optimize the time complexity of our algorithms. Here are a few websites that deal with Dynamic Programming:

The Idea:

Source Code:

#include <stdio.h>

int main()
{
	int n;
	scanf("%d", &n);
	long long int i;
	long long int max_val[n+1];

	max_val[0] = 0;
	max_val[1] = 1;

	for(i=2; i<=n; i++)
	{
		if(i > max_val[i/2] + max_val[i/3] + max_val[i/4])
			max_val[i] = i;
		else
			max_val[i] = max_val[i/2] + max_val[i/3] + max_val[i/4];
	}

printf("Maximum value obtainable: %lld\n", max_val[n]);

return 0;
}