Thursday, June 30, 2011

Random Shuffling - intuitive Fisher Yates algorithm.

Today we will discuss an algorithm for random shuffling.

Random shuffling of a ordered set of numbers would end up in a different order. Mathematically, result is a different combination. Also, Random shuffle needs the algorithm output all possible combinations with equal probability. Say 1, 2, 3 is our input, then probability of getting the result 3, 2, 1 should be tantamount of getting 2, 3, 1. So, getting a specific output for the given input of cardinality 'n' should have the probability 1 / (n!).

The Fisher-Yates algorithm for the above problem is very elegant. Idea is to select the first number out of "n" possibilities and next out of "n-1" possibilities and so on. So, we will get an output which has the probability of occurring, 1 / n!.

Please find the code-snippet, below for the same.

for( int i = Length - 1; i > 0; i--)
{
int currentRandomIndex = GetNextRandomNumber(0, i);
Swap( inArray[i], inArray[currentRandomIndex];
}

If you write the algorithm for this on your own, there are possibilities to make subtle mistakes like the one in this link, http://www.codinghorror.com/blog/2007/12/the-danger-of-naivete.html

Another interesting variant is collecting "n" random samples out of huge set of cardinality "N". I have discussed the same in the blog,




Friday, June 24, 2011

Divide and Conquer - Intuitive understanding of Master theorem

Today we will discuss about Master Theorem, in an intuitive way.

Divide and Conquer is a way to solve the problem by dividing the problem space recursively and adding up the solutions to get the complete solution. Merge sort uses this approach to solve the problem of Sorting. Given a problem, how can we make sure divide and conquer would help to solve that. For instance, finding a minimum in a set of numbers of cardinality n, can be solved in O(n) using naive approach. Can we use divide and conquer approach to solve this problem? Ok, that won't be efficient, in fact make your program run slow rather than naive approach in practical. Which makes divide and conquer not usable for some problems. Answer lies in Master theorem which makes asymptotic analysis of certain recurrence relations easy. Let us formally analyse the complexity of Merge sort and Finding minimum.

Merge Sort:
T(n) = 2 T(n / 2) + Ө(n), Ө(n) is the complexity of merging results from divided problem spaces, in this case problem space is divided by two recursively.

Let us try to expand T(n),

T(n) = 2 T(n/2) + n, Ө has been eliminated for ease of understanding.
= 4 T(n/4) +2 * n / 2 + n
= 8 T(n/8) + 4 * n/4 + 2 * n/2 + n
....
= n log 2 2 T(n/ n log 2 2) + ~nlog 2 n
= n + nlog 2 n, T(1) is assumed as 1 here.
Finally, T(n) is Ө(n log 2 n), since n * log 2 n dominates.

In case of our contrived example of finding minimum using divide and conquer would give
T(n) = n + 1 * log 2 n
So, it would be Ө(n) since n dominates.

Let us define the general recurrence relation, T(n) = a T(n / b) + f(n)
Intuitively, T(n) would be the dominating one among n log b a and f(n) as per the discussion above.

Master theorem formulates the above general problem, in three cases as given below.

Case 1:

If f(n) = O( n log b a-e) for some constant e > 0, then T (n) = Θ( n log b a).
Intuitively n log b a dominates f(n), hence T(n).

Case 2:

If f(n) = Θ( n log b a log k n) with k ≥ 0, then T (n) = Θ(n log b a log k+1 n),

You could correlate the results from above merge sort analysis with this.

Case 3:

If f(n) = Ω( n log b a+e) for some constant e > 0, then T (n) = Θ( f(n) ). It needs a condition that, a * f(n/b) <= c f(n).

This is again, an inverse condition of Case 1. Important thing to note is the condition. If the condition is not satisfied, f(n) might be f(n) log b n.

So, the bottom line is Complexity analysis of Divide and Conquer can be easily done with Master theorem. As an exercise, you could try analysing Binary search algorithm using Master theorem method.

One important thing to note is Master theorem applies, if and only if f(n) is polynomial.

Thanks for reading. Suggestions and Questions are welcome.

Wednesday, June 22, 2011

Segment Tree.

Today we will discuss about a data-structure, segment tree which again deals with Range Query and Update as Binary Indexed Tree.

Range Minimum Query(RMQ) is a classic computer science problem, which asks for minimum value, repeatedly in the given ranges. Let us assume an array { 2, 4, 6, 7, 5, 3, 1, 8 }. And we would be getting queries like, minimum in the range 4 - 5, 0 - 3, etc. Here the minimum operation can be anything like maximum, sum, etc.

One easier way to solve this problem(RMQ) is to pre-process information. In case of minimum, pre-processed info can be an array such that value at an index corresponds to minimum value encountered until that index in input array.

ProcessedArray[i] = min{ inputArray[0], inputArray[1] ... inputArray[i] }. Better dynamic programming approach would do pre-processing in O(n). But the problem is updating a value in inputArray would make processed array, completely invalid in the worst case. So, updates would be very costly.

Naive approach is to check all the elements in the range for minimum value and complexity in this case would be O(n) in the worst case, which is un-affordable in case of huge number of queries.

Segment trees would be a better candidate for this problem as it stores minimum values of intermediate ranges in tree nodes and original values in tree leaf nodes. This may lead to a space complexity of twice the number of elements in the original input approximately. But complexity of query reduces to log N which is promising. Segment tree is balanced and complete binary tree. It could be considered as a heap as it has characteristics of completeness and parent node's value is a function of child nodes, in RMQ it is minimum function. So, it could be stored in Array, as we do in Binary Heap.



Each node stores the minimum value of the given range. Range is depicted inside the circle. For instance [0, 9] means, it stores the minimum value of the range 0 - 9. Now query for a range could easily be done in log N times. Let us say, if we need to Query the range 0 - 6, it could be divided into 0 - 4 and 5 - 6, which could be got from tree nodes easily. I am adding up source code written in C++ for the same below. Questions are welcome.

Code below, is for Demo Purpose, not of production quality.

Thanks for Reading.

double Log2(double in)
{
return log(in)/ log(double(2));
}

int NearestPowerOfTwo(int in)
{
return pow(2, ceil(Log2(in)));
}

class SegmentTree
{
public:
SegmentTree()
{
treeArray = NULL;
}
~SegmentTree()
{
delete [] treeArray; treeArray = NULL;
}
void CreateTree(int *inArray, int length);
void InitializeTree(int internalIndex, int *inArray, int start, int end);
int QueryTree(int internalIndex, int start, int end, int rangeStart, int rangeEnd)
{
// Pre-condition check.
if( rangeStart > end || rangeEnd < start)
return -1;

// Check the present range overlap
if( start >= rangeStart && end <= rangeEnd)
return treeArray[internalIndex];
int left = QueryTree(2 * internalIndex + 1, start, (start + end) / 2, rangeStart, rangeEnd);
int right = QueryTree(2 * internalIndex + 2, (start + end) / 2 + 1, end, rangeStart, rangeEnd);

if( left == -1)
return right;
else if( right == -1)
return left;
else
return (left < right) ? left : right;
}

void UpdateTree(int index, int value);

private:
int *treeArray;
int treeLength;
};

void SegmentTree::CreateTree(int *inArray, int length)
{
// Create Tree here...

treeLength = 2 * NearestPowerOfTwo(length) - 1;
treeArray = new int(treeLength);

InitializeTree(0, inArray, 0, length - 1);
}

void SegmentTree::InitializeTree(int internalIndex, int *inArray, int start, int end)
{
// Base condition.
if( start >= end)
{
treeArray[internalIndex] = inArray[start];
return;
}
InitializeTree(internalIndex * 2 + 1, inArray, start, (start + end) / 2);
InitializeTree(internalIndex * 2 + 2, inArray, (start + end) / 2 + 1, end);

if( treeArray[internalIndex * 2 + 1] < treeArray[internalIndex * 2 + 2])
treeArray[internalIndex] = treeArray[internalIndex * 2 + 1];
else
treeArray[internalIndex] = treeArray[internalIndex * 2 + 2];
}

void SegmentTree::UpdateTree(int index, int value)
{
int indexToModify = treeLength / 2 + index;
treeArray[indexToModify] = value;

int indexToCorrect = (indexToModify - 1) / 2;

while(indexToCorrect >= 0)
{
if( treeArray[indexToCorrect * 2 + 1] < treeArray[indexToCorrect * 2 + 2])
treeArray[indexToCorrect] = treeArray[indexToCorrect * 2 + 1];
else
treeArray[indexToCorrect] = treeArray[indexToCorrect * 2 + 2];
if( indexToCorrect == 0) break;
indexToCorrect = (indexToCorrect - 1) / 2;
}
}

Handy function for Log Base 2 and nearest power of two in C++

Since there is no direct way to find Logarithm Base 2 of a number in C++, the given function in C++, would come handy in several situations.

// Essential Header file - math.h

// Log2 Function

double Log2(double in)
{
return log(in)/ log(double(2));
}

Let us use the above function for calculating nearest power of two easily, as given below.

int NearestPowerOfTwo(int in)
{
return pow(2, ceil(Log2(in)));
}