Basics I – Searching

Search Algorithms:

Given a database, the first thing you would want to do is probably search through its entries to see if a particular entry exists. This is why search algorithms are one of the most basic and widely used algorithms, and one of the first algorithms taught in any class.

Linear Search:

The Facebook user database has nearly 800,000,000 entries. To search for an user, a linear search algorithm would check each entry from the beginning to the end to see if there is a match. In the worst case, if the entry we are searching for is at the end of the list, we might have to go through all 800,000,000 entries. Even if checking each entry takes only .000001 secs on an average, total time we would need to search for the entry in the worst case is: 800,000,000 * .000001 = 800 secs or 13.33 mins. Imagine waiting for 13 minutes just to check if your friend is on Facebook!

The advantage of linear search apart from its simplicity is that it can be used irrespective of whether the database is sorted or not.

The disadvantage is obviously that it is extremely time-consuming when we are working with large databases.

Binary Search:

Binary Search drastically reduces the worst-case time required to search through a database. However, it can be used only when the database is sorted i.e. when the entries are arranged either in ascending or descending order.

The Binary Search algorithm:

bool BinarySearch(A, left, right, x):

// A is the array(database) of numbers

// x is the number to search for

// left is the leftmost index(starting index) of the working array

// right is the rightmost index(ending index) of the working array

if (left == right):

if (A[left] == x):

return TRUE;

else:

return FALSE;

else:

int mid = floor[(left+right)/2];

if (A[mid] == x):

return TRUE;

else if (A[mid] < x):

BinarySearch(A, mid+1, n, x);

else if (A[mid] > x):

BinarySearch(A, 1, mid-1, x);

The key to understanding the algorithm is that, since the array is sorted in ascending order, if the middle value is less than x, then x(if it exists) will obviously be among the values to the right of the middle value in the array. And if the middle value is greater than x, then x(if it exists) will be among the values to the left of the middle value. Accordingly, the algorithm recurses on the left half of the array A[1…mid-1] or the right half A[mid+1…n].

The if(left == right)  part checks to see if we have boiled down to only one element. In that case, if that element equals ‘x’ then the algorithm returns TRUE, else we can be sure that ‘x’ does not exist in the array(database) and the algorithm returns FALSE.

Example:

Consider the small database of numbers:

12 43 9 32 89 2 6 22 11 (no. of elements, n = 9 )

Sorting the database using an appropriate sorting algorithm (see <insert link> for Sorting Algorithms),

2 6 9 11 12 22 32 43 89 ( n = 9 )

The algorithm then works as shown in the image below:


Binary Search Example

Time Complexity:

As mentioned previously, Binary Search works a lot faster than the conventional linear search. In fact if we look at the Facebook user database example as discussed previously, Binary search on such a database will require comparisons of the order of only log(800,000,000) ~ 30. So if checking each entry takes .000001 secs as before, time required using Binary Search would be 30 * .000001 = .00003 secs.

Recommended Links: