Monday, October 24, 2011

Visitor Pattern in C++ - Intuitive understanding.

Today we will discuss Visitor pattern in an intuitive way as on the face of it, it would seem pretty intricate. First off, we need to understand the scenarios, in which Visitor would be of great helpful. Let me start with basic examples and move on to complex details of Visitor pattern.

When we design any container, our focus should be on data-structure which holds the data but not on algorithms which work on contained data. For instance, STL containers had been designed in such a way that algorithms can be developed independent of data containers. STL achieves this by the concept of iterators and by the uniformity of type of contained elements. Every container that supports iterators, can be used seamlessly in algorithms.

What if the contained elements are of disparate types? Now iterators wouldn't help this situation as types are varying. More importantly, if the contained elements need to be treated differently for different types, even deriving from a same base class wouldn't be much helpful. Consider an example of Shape objects like rectangle and Ellipse. Even though both can be described by Bounding rectangles, calculation of area would differ. So, a common algorithm for area is not possible, even though both are Shape objects. This is one of the scenarios in which Visitor can help.

In a nutshell, Visitor pattern can be used to add a new algorithm to work on a container without modifying the container itself. Now, let us translate the above line in OOP way.

Visitor is a way to add a new behavior to the existing class without modifying the class itself.

Perhaps, this is one more to way to achieve Open-Closed principle of OOP, Let us explore.

Let us implement some trivial examples to proceed further. Implementations are not of production quality; proper memory management is not there. This is just to explain the concepts.

Let us define an interface, IShape from which Rectangle and Ellipse are derived as given below.

struct Bounds
{
Bounds() : left(0), right(0), top(0), bottom(0)
{

}
Bounds(int ileft, int itop, int iright, int ibottom) : left(ileft), top(itop), right(iright), bottom(ibottom) { }
int left;
int right;
int top;
int bottom;
};

class IShape
{
public:
virtual Bounds GetBounds() = 0;
virtual void Draw() = 0;
};

class Rectangle : public IShape
{
public:
Rectangle(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Rectangle here.
}
private:
Bounds currentBounds;
};

class Ellipse : public IShape
{
public:
Ellipse(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Ellipse here.
}
private:
Bounds currentBounds;
};

Let us define a Graphics Designer, a composite class for Shapes. At Run-time several shape objects could be added to this toy Graphics Designer and Draw can be invoked to draw all the elements.

class GraphicsDesigner : IShape
{
public:
void AddRectangle(Bounds inBounds)
{
currentElements.push_back(new Rectangle(inBounds));
}

void AddEllipse(Bounds inBounds)
{
currentElements.push_back(new Ellipse(inBounds));
}

void Draw()
{
// Enumerate through the currentElements and Draw each.
}

private:
vector<IShape*> currentElements;
};

Given a class like GraphicsDesigner, How to add a new algorithm to work on contained elements! For instance, we need to find the Minimal rectangle which covers all the elements. In order to achieve this, we would need to add a method in container which enumerates through the elements and execute the algorithm. This violates OOP. And every time, we need to implement a new algorithm, we would end up adding new methods into the container class.

Algorithms are getting bound to the data container.

We will analyse further to understand the problem well.

Adding methods to the container is not a good idea as algorithms just depend upon the contained elements and not on the state of the container.

Needed algorithms can't be defined before hand, just think of possible algorithms on Integers containers, finitely huge, right.

Given that, we need to find out a way to dynamically add the behaviors on to the containers. Without further ado, we will define Visitor pattern and see how it resolves this.

For every algorithm, we should derive a class from Visitor interface; Visitor interface should contain a collection of overloaded Visit methods one for each different types of Visit-able classes. Visit-able classes as per our example, are Rectangle and Ellipse(and general Shape also). Every visit-able class should define a method "Accept".

Let me define a simple PrintVisitor to clarify the needs of all these interfaces.

Before that, given below are the example classes after the changes, made with Visit and Accept interfaces.

class Rectangle;
class Ellipse;

class IShapesVisitor
{
public:
virtual void Visit(Rectangle*) = 0;
virtual void Visit(Ellipse*) = 0;
};

class IShape
{
public:
virtual Bounds GetBounds() = 0;
virtual void Draw() = 0;
virtual void Accept(IShapesVisitor* visitor) = 0;
};

class Rectangle : public IShape
{
public:
Rectangle(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Rectangle here.
}
void Accept(IShapesVisitor* visitor)
{
visitor->Visit(this);
}
private:
Bounds currentBounds;
};

class Ellipse : public IShape
{
public:
Ellipse(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Ellipse here.
}
void Accept(IShapesVisitor* visitor)
{
visitor->Visit(this);
}
private:
Bounds currentBounds;
};

class GraphicsDesigner : IShape
{
public:
void AddRectangle(Bounds inBounds)
{
currentElements.push_back(new Rectangle(inBounds));
}

void AddEllipse(Bounds inBounds)
{
currentElements.push_back(new Ellipse(inBounds));
}

void Draw()
{
// Enumerate through the currentElements and Draw each.
}

Bounds GetBounds() { return Bounds(); }

void Accept(IShapesVisitor* visitor)
{
vector::iterator shapeItr = currentElements.begin();
for(; shapeItr != currentElements.end(); shapeItr++)
{
IShape *val = (*shapeItr);
val->Accept(visitor);
}
}

private:
vector<IShape*> currentElements;
};

Now, we will define a simple PrintVisitor to show the power of Visitor pattern.

class PrintVisitor : public IShapesVisitor
{
public:
void Visit(Rectangle* inShape)
{
cout << "This is a Rectangle" << endl;
}

void Visit(Ellipse* inShape)
{
cout << "This is an Ellipse" << endl;
}
};

Of course, "Print" is so trivial to be considered as an algorithm which works on the data. But it clearly avoided the need for having "Print" virtual function in Shape derived classes. Now let us think of a decent algorithm which could work on the data. Let us assume, we need to find the total area occupied by all shapes in GraphicsDesigner. Without Visitor pattern, we would have to add up a new member function in GraphicsDesigner class. But with Visitor pattern, things become very easy that we need to add a new class in Visitor hierarchy as given below.

class TotalAreaVisitor : public IShapesVisitor
{
public:
TotalAreaVisitor() : TotalArea(0.0)
{
}
void Visit(Rectangle* inShape)
{
Bounds bounds = inShape->GetBounds();
int width = bounds.right - bounds.left;
int height = bounds.bottom - bounds.top;
int currentArea = width * height;
TotalArea += currentArea;
}

void Visit(Ellipse* inShape)
{
Bounds bounds = inShape->GetBounds();
int width = bounds.right - bounds.left;
int height = bounds.bottom - bounds.top;
double currentArea = width * height * 3.14 / 4;
TotalArea += currentArea;
}
double GetTotalArea()
{
return TotalArea;
}
private:
double TotalArea;
};

I hope now the need for Visitor pattern is very clear. For every new algorithm, instead of adding a new method in container class, we can add a new visitor. This helps us to achieve one of the SOLID principles, Open-Closed principle. There are disadvantages also like a new Shape derived can't be easily added as it needs modification of all possible Visitors.

So, Visitor pattern trades off the ease of adding new behaviours into Visit-able hierarchy with not easily allowing to create a new derived class in Visit-able hierarchy.

Thanks for Reading.

Friday, October 21, 2011

Bloom filter implementation in C++ - NikBloomFilter

Bloom filter can be considered as a succinct representation of a set of elements. Set of elements might be stored in files or in Database tables, etc. And a set may support operations like insertion, deletion and retrieving elements.

For this discussion, Let us assume a dictionary of passwords which is a set of huge number of strings which is stored in several files according to their starting character like A-List, B-List,etc. We assume that we don't keep all this information in Main memory. So, in order to find a string's presence in the dictionary, Program has to fetch the right file according to the starting character and keep it in main memory for a Binary search. Generally, we don't require frequent insertions and deletions from this set, but queries for the presence of an element. Only way to make the queries fast is to cache complete set of elements in Main memory. But there are some scenarios in which caching is unaffordable.

Bloom filter, as I said early, is a succinct data-structure which registers the presence of elements in a set. While we insert an element in a set, Bloom filter must be updated. Bloom filter is basically a bit array. A small subset of bits in that array would correspond to an element in the set. So, when an element is inserted, corresponding bits should be set. Now query is simply a check for the corresponding bits state; if all the bits corresponding to that element are set, then query would return true. But this true may not be a real true. Since every element maps to a subset of bits in the bit array, overlapping may occur and which could cause a false positive. Before going further, we will analyse this structure mathematically.

Bloom filter is a probabilistic data structure. When an element is inserted, it has to set certain bits in the bits array. In order to find out what are all the bits to be set, we need to hash the content of an element. Now, hashed elements could be used to construct the indices of bits to be set. For instance, MD5 of a byte array would result in 16 bytes which could be used to construct 4 indices. We will get into implementation details later in this discussion. As we discussed early, this data structure may give false positive. Luckily, we could be able to control this by trading off space.

Let us get into the analysis. Let us assume the size of the bloom filter, "M" bits and "N" elements had been inserted into the set, hence in Bloom filter also.

Now, probability of a bit B[i] to be unset or zero is (1 - 1/M) ** kN. Here ** represents "power of" operator. k is the number of bits to be set for an element in a set. The above formula could be easily derived as follows: Since N elements are inserted, setting some bit in the bloom filter would have happened for kN times. One particular bit not set in the first time is M-1 / M, for two consecutive times, is (M-1 / M) * (M-1 / M) and so on. Underlying assumption in this, is setting bits are statistically independent. Asymptotically, this is true. So, we will assume so in this discussion.

P{ B[i] == 0] } is (1 - 1/M) ** kN, approximately e ** (-kN/M).

False positive is a condition in which bloom filter tells a string present in the set which is not actually present. Cause of this condition is the bits corresponding to the string set by other independent strings and due to cumulative effect of that, as described earlier in this discussion. Probability of false positive is all the bits corresponding to the string are set which is,

(1 - e ** (-kN / M) ) ** k => P{ False Positive }
Since the false positive probability depends on the factors k, N, M, we need to minimize the function ( taking derivative and equating to zero) with respect to 'k'. Analysis shows this function becomes minimum if 'k' becomes (M / N) * ln 2. ln 2 is natural logarithm of 2.

If we assign 'k', the value (M/N) * ln2 in the False positive probability function, we will get,

P{ False Probability } = (1 - e ** -(ln 2) ) ** k = (1/2) ** k or (0.6185) ** (M/N).

The above result is the lynch-pin of the Bloom filter implementation, as it gives the way to control False positives. For instance, if we wish to have F.P of 0.2 then M/N must be around 8. "N" is the number of elements in the set. Given that, "N", bit-set size of the Bloom filter, could be easily calculated.

An implementation of Bloom filter in C++ has been done and checked in to the Git-hub.

This implementation doesn't have support for Deletion and merging with some other Bloom filter as it is the initial release. Any comments on this is welcome.

Thanks for Reading.

Saturday, October 15, 2011

Prototype Pattern in C++, Dynamic instantiation.

Before starting the discussion of Prototype pattern, We will understand the problem of Dynamic instantiation.

Given a typename ( a class name ) in C++ as a string, how an object can be created of that type?

First off, there is no language level support for dynamic instantiation in C++ unlike Java and C#. Since in C++, there is no common base class for all classes, dynamic instantiation support is generally not possible. MFC, a C++ framework provides support for Dynamic instantiation as all classed are derived from CObject. Let us try in the same line, making use of common base and Prototype pattern to achieve Dynamic instantiation.

Prototype pattern imposes a class to have a method, Clone, whose sole purpose is to generate a new object of same type. With a clone-able object, we could be able to generate a new object of same type. In order to achieve this, all classes must be derived from a common base class. And in that common base class, we can have a pure virtual member function(interface), Clone. So, all classes derived from that, would be clone-able.

Let us call that common base class "Object" as in Java. My minimalistic implementation of the Object is as given here,

class Object
{
public:
virtual ~Object() {}
virtual Object* Clone() const = 0;
static Object *MakeObject(std::string type);
static void AddNewClassInfo(std::string type, Object* in);
static std::map objectsTable;
};

Clone is the key method of this abstract class. Implementation of the same could be like the one given below.

class Derived : public Object
{
Object *Clone() { new Derived(); }
};

So, all the classes derived from Object can be cloned using this method call. In order to "Clone" an object, we should have one initial object, prototypical instance. And we should be able to get that base prototypical instance from the type name. If we could be able to do this, then Making an object using type-name alone is a cake-walk. "objectsTable" map in Object class holds the aforesaid mapping; a map of type-name and a prototypical instance.

Adding a prototypical instance can be done through the method, "AddNewClassInfo", whose implementation can be like this.

// Populate map with type-name and its corresponding prototype.
void Object::AddNewClassInfo(std::string type, Object *in)
{
objectsTable[type] = in;
}

Now the implementation of "MakeObject" must be easily understandable.

Object* Object::MakeObject(std::string type)
{
std::map::iterator itr;
itr = objectsTable.find(type);
if( itr != objectsTable.end())
{
return itr->second->Clone(); // Clone the prototypical instance.
}
else
return NULL;
}

All the essential implementations of "Object" is done. Since it has a static member, it should be defined in CPP file as given here,

std::map Object::objectsTable;

OK. Now let us create a derived class from Object and test this implementation.

#include "Object.h"

class DynamicInstantiable : public Object
{
public:
DynamicInstantiable() {}
void SayHello()
{
std::cout << "Hello! " << std::endl;
}

Object* Clone() const
{
return new DynamicInstantiable();
}
};

The above class has the implementation for Clone method and is derived from Object. Now, this type must be added in Static Map in "Object" class using Object::AddNewClassInfo. We have several options here. But I would like to keep things simple. So, I have added a new header file which has a global function "InitializeDynamicObjects". And this method must be called in "main()" function in the very first line itself.

A sample implementation is given below.

void InitializeDynamicObjects()
{
Object::AddNewClassInfo("DynamicInstantiable", new DynamicInstantiable());
}

Here is the sample Main function, I have written to test the code.

#include "DynamicInstantiable.h"
#include "ObjectInitializer.h"
using namespace std;

int main()
{
InitializeDynamicObjects();

DynamicInstantiable* newInst = dynamic_cast(Object::MakeObject("DynamicInstantiable"));

if( newInst != NULL)
{
newInst->SayHello();
}

cout << "Hello World" << endl;
}


I hope this post explains Prototype clearly. Comments and Queries are welcome.

Thanks for Reading.

References:

Friday, October 14, 2011

Curiously recurring template Pattern, CRTP - Static Polymorphism in C++

Let us start this discussion with some minimalistic C++ classes, which doesn't convey any meaning but serves the purpose of discussion.

class Base
{
public:
void PrintMe()
{
std::cout << "Print: Base" << std::endl;
}
};

class Derived : public Base
{
public:
void PrintMe()
{
std::cout << "Print: Derived" << std::endl;
}
};

void TestPolymorphism()
{
Base *ptrBase = new Derived();
ptrBase->PrintMe(); // This would print "Print: Base"
}

In the above code, ptrBase is initialized with an object of Derived. Ideally, PrintMe should have printed "Print: Derived". But it would print "Print: Base". Any C++ programmer could identify the issue; early binding according to the type. It means when we say, ptrBase->PrintMe(), compiler would check whether the function called is virtual or not. If it is not virtual, it would bound this call to the address of function defined in the calling type; In this case calling type is Base.

Dynamic binding can be achieved using "Virtual" specifier, in method declaration. When we mark a function with Virtual, compiler wouldn't make early binding; it would resolve the same using V-Table. Every class which has a virtual method, would have a table of function pointers. And a hidden pointer for the table, will be inserted into the class and will be initialized during construction of objects of the class. So, during a real method invocation on objects, two things would happen. Getting the right function address from V-Table and calling the function pointed by the same. Even though this extra indirection doesn't cause much performance degradation, in several scenarios this could be easily avoided. Please find the snippet below, after adding virtual specifiers.

class Base
{
public:
virtual void PrintMe()
{
std::cout << "Print: Base" << std::endl;
}
};

class Derived : public Base
{
public:
virtual void PrintMe()
{
std::cout << "Print: Derived" << std::endl;
}
};

void TestPolymorphism()
{
Base *ptrBase = new Derived();
ptrBase->PrintMe(); // This would print "Print: Base"
}

As we discussed early, even though dynamic polymorphism doesn't cause much performance overhead, it would make a function not "in-line"able. Sometimes, in-lining a simple function would improve performance especially if it is being called several times in code. Since normal function call breaks code execution flow, CPU level, caching like optimizations, are not possible. With static polymorphism, we could achieve the in-lining capability. Now let us get into the topic of Static polymorphism.

Let us try to make Base class function a bit intelligent.

In Base class, if we could cast the "this" pointer's type to the right Derived class, we could be able to solve the issue of Dynamic binding; Means avoiding Virtual specifier.

The below code snippet won't compile, but it gives the idea of what to do.

class Base
{
public:
void PrintMe()
{
static_cast<Derived*>(this)->PrintMe(); // This would call "Print: Derived"
}
};

We had successfully removed Virtual Keyword; So, V-Table wouldn't be created and function can be in-lined. There are some problems; First off, it wouldn't compile, as compiler doesn't know the Derived class yet. Secondly, this is specialized for only one derived class. This specialization can be removed with templates. Yes. that is the whole idea.

template<Derived>
class Base
{
public:
void PrintMe()
{
static_cast<Derived*>(this)->PrintMe(); // This would call "Print: Derived"
}
};

Now, you could derive classes from the above class like this,

class Derived : public Base<Derived>{ };

Deriving from template classes, specialized with the same Derived class type, is called as CRTP, curiously recurring template pattern. Even though the above example is very minimalistic, there are several uses for this pattern. One of the great examples, as given by Wikipedia is Counter base class. In order to get the statistics of objects of a particular type, we can implement a Counter class as below.

template<T>
class Counter
{
public:
static int GetTotalObjectsCreated()
{
return nObjectsCreated;
}
protected:
Counter() { nObjectsCreated++;}
~Counter() { nObjectsCreated--; }
private:
static int nObjectsCreated;
};

template<T>
int Counter<T>::nObjectsCreated = 0;

Let us assume we need to get the number of objects created for a particular type, Test. Derive Test from Counter, specialized with Test itself.

Class Test : private Counter<Test>{ };
// Private derivation is to show that it is not 'is a' relationship.

Now at any point of time, in program execution, we could get the number of objects alive using the call like this.

int nObjects = Counter<Test>::GetTotalObjectsCreated();

CRTP is cool and positive side effect of Code replication based generics mechanism, unlike Java and .NET.



Thursday, October 6, 2011

Doors Toggling Problem.

Today we will discuss a puzzle which is very interesting as its end result is mathematically elegant and intuitive.

There are 100 doors initially kept in closed state. One has to go through 100 iterations and in each iteration, he has to toggle a certain set of doors given by the rule below:


In 'i'th numbered iteration, he has to toggle the doors i, 2i, 3i, 4i, .. n * i. where n * i <= 100 < (n + 1) * i. For instance, in 3rd iteration, doors to be toggled are 3, 6, 9 ... 99.

By toggling, it means if the door is closed it has to be opened and if it is opened, it has to be closed.

Given that, question is after 100 iterations, which are all the doors would be in open state. Please stop reading if you want to try this puzzle on your own.

First off, door would be in OPEN state, if it had been toggled for odd number of times otherwise it would be in CLOSED state. I hope this is clear. This is very essential to understand the following.

Let us try some simple cases now. The first door wouldn't be toggled further, after 1st iteration which means one of the doors in open state after 100 iterations, is first door. Second door would be toggled in 2 iterations, namely first and second. So, it would be in closed state. Third would be toggled in iterations 1, 3 and would be in closed state. Fourth would be toggled in iterations 1, 2, 4 and would be in opened state. Now you could see the pattern clearly.

A 'N'th numbered door would have been toggled in an iteration 'i' if 'i' is a divisor of 'N'. Number of times a door toggled is the total number of divisors of door's number. For door 1, only one divisor is there which is again 1. For door 4, divisors are 1, 2, 4.

So, this question boils down to the simplified version: what are all the numbers( < 100) contains odd number of divisors.

Number theoretically, for what numbers tau(n) is odd. I will discuss Number theoretic functions in a different post.

Generally, if a number 'N' is divisible by 'n' and it would be divisible by (N / n) also. That means number of divisors would come in pair. But if a number is a square number like 25 which has divisor 5 and pair ( 25 / 5) is also 5, then it has odd number of divisors.

All square numbered doors like 1, 4, 9 ... 81, 100 would be in OPEN state and others in CLOSED state.


So, the number of opened doors after 100 iterations is 10. If you simulate the puzzle in computer, the program would be of complexity O(n * H[n]) where H[n] is 'n'th harmonic number.

http://en.wikipedia.org/wiki/Harmonic_number

In this naive approach, 1st iteration has to toggle 100 doors, 2nd 50 doors, 3rd 33 doors .... 100th 1 door.

So, total number of toggles would be ( 100 + 50 + 33 + 25 ... 1) => 100( 1 + 1/2 + 1/3 ... + 1/100), which is 100 * H(100) . I am attaching the code in Java. Remove the comments if you want to check the complexity in terms of iterations.


public static void main(String[] args)
{
List<Boolean> doorState = new ArrayList<Boolean>();
for(int i = 0; i <= 100; i++)
doorState.add(false);
// int totalIterations = 0;
for( int i = 1; i <= 100; i++)
{
// int innerIterations = 0;
for( int j = i; j <= 100; j += i)
{
Boolean state = doorState.get(j);
doorState.set(j, !state);
// totalIterations++;
// innerIterations++;
}
// System.out.println(innerIterations);
}
// System.out.println(totalIterations);
System.out.println("Following doors would be open: ");
for(int i = 1; i <= 100; i++)
{
if(doorState.get(i))
{
System.out.println(i);
}
}
}


Thanks for Reading.

Tuesday, September 27, 2011

C++ RAII - Resource acquisition is initialization.

RAII - Resource Acquisition is initialization is one of the powerful idioms in C++, which makes code exception-safe and maintainable. The idiom makes use of the feature of C++ that destructor for stack allocated objects would be called at the end of scope. Generally, resources can be allocated and deallocated at any point in implementation, which could be acceptable in all-goes-well scenarios. If exception had been thrown during execution of code, after resource allocation, but before release, then huge possibility is there to have the resource not released properly.

At the end of scope, in C++, all the stack allocated objects will be destructed for sure, even exception is thrown. RAII Idiom makes use of this feature by wrapping the resource in a class. In constructor, resource would be allocated and in destructor, release would happen. This is exception-safe. This calls for cleaning up all the resources at single place, destructor, instead of sprinkling this logic everywhere.

Let me give a simple example to illustrate this concept using File Stream example. Consider a function which takes two arguments, a string and file name; Let us assume the functionality of this is to write the string into file. One may code as given below.

fstream stream;
stream.open(fileName, ios::binary | ios::out | ios::app);

if(stream.is_open())
{
stream << str; // Assume there is an exception here ...
}

stream.close();

If there is some exception thrown at some point while writing, stream.close() won't get called at all. One solution is to solve this is adding a try/ catch around the writing part, which may lead to unessential maintenance issues. Program wouldn't be easily readable also. Let us rewrite the same using RAII as below.

class SmartFile
{
public:
SmartFile(string fileName)
{
cout << "Stream Opening" << endl;
stream.open(fileName, ios::binary | ios::out | ios::app);
}
~SmartFile()
{
cout << "Stream Closing" << endl; stream.close();
}
void WriteString(string str)
{
if(stream.is_open())
{
cout << "File Writing" << endl; stream << str;
}
else
{
cout << "Write Error" << endl;
}
}
private:
fstream stream;
};

void WriteStringToFile(string str, string fileName)
{
SmartFile file(fileName);
file.WriteString(str);
}

The above function using RAII class SmartFile which is reliable, exception-safe. You can try the below given code to understand the same.

void WriteStringToFile(string str, string fileName)
{
SmartFile file(fileName);
throw 10;
file.WriteString(str);
}
Even though the above code throws an error code abnormally, SmartFile's destructor will be called.

RAII is being used in Smart Pointers, Synchronization based lock classes, etc. In a nutshell, this idiom necessitates to have all resources allocations in constructor and all deallocation logic in destructor to make the code exception-safe and maintainable.

Thanks for Reading.

Saturday, September 24, 2011

Nagle's Algorithm - Intuitive understanding.

In TCP / IP stack, at every layer some header information would be added up to make sure the information is identifiable at the corresponding layer in the destination node. For instance, at TCP Layer, TCP header would be added, which contains essential information to make sure TCP segments reach properly at the destination and congestion control is provided, etc. Without TCP header, these functionalities would be handled at the upper layer, say application layer which mainly deals with real data being sent.

Layered architecture imposes disparate responsibilities for layers. App layer deals with semantics of data. Transport deals with transport of data in a reliable way. IP to pass packets to the next hop, etc. So, real data will be treated in different ways in different layers. In Transport layer, TCP, data unit is Segment, In IP the same is Packet. This leads to varying level of treatments of the same information.

Applications can't determine how data would be transmitted to the destination. This in several cases wouldn't be a problem. For instance, file transfer would send large amount of data to TCP layer and TCP would transfer the same to the destination in FIFO, first in first out, order. TCP would internally divide the real file into several chunks whose size matches, MSS, Maximum segment size. This behavior is neat since chunks will be sent in FIFO and chunk size depends on several optimal network parameters like destination node's capability, network channel's bandwidth, etc. As long as file is being transmitted in a reliable way, this approach or algorithm is optimal. Only overhead in segment based transfer is TCP header along with IP header which would be 40 bytes in general. If MSS is 1000 bytes, this overhead is acceptable, since always segment size will be of MSS. Let us consider a Telnet session in which command data will be less than Header size. In this scenario, overhead is unacceptable as it is far bigger than real data being passed. If the real data being sent is 1 byte, then it overhead is of 4000%. In low bandwidth environment, this is a bottleneck.

The Nagle's algorithm solves this issue by buffering, called Nagling, the data to be sent. Algorithm is very intuitive. If the buffered data is enough to fit in a segment of size MSS, then it would be sent across; If all the sent data are already acknowledged, no need to buffer the current data at all.

Let us see how this algorithm works.

The issue discussed is not at all a problem in High bandwidth environments and data should be passed however low is the size of data. In several real time systems, this is an essential behaviour. In high bandwidth environments, most probably all data sent across would be acknowledged immediately. So, Nagling wouldn't happen.

In Low bandwidth environments, the data would be accumulated by TCP until the segment size of MSS, and send across. This would help to avoid overhead what we had discussed. The real issue with Nagling comes up where real time low data traffic should be sent in Low bandwidth environments. In this scenario, data transfer wouldn't happen immediately but buffered, which would lead to irritating user experience.

One more issue with Nagling is "Delayed ACK"s from the sender side which would delay the acknowledgements of Segments for certain time to optimize the network utilization.

Delayed ACKs' idea invalidates the Nagling optimization heuristic, since the latter depends on getting ACK to decide the buffering. Delayed ACK would make the Nagling happen always independent of network bandwidth but Nagling should happen only in Low bandwidth network. Nagle Algo's heuristic is not applicable in case Delayed ACK is enabled in Destination.

Almost all TCP implementations would allow to disable Nagling using TCP_NODELAY option, even though it is not suggested. In application level, the issues of Nagling could be resolved in several ways as explained in WikiPedia page,

Thanks for Reading.

Friday, September 9, 2011

Wubi installer - Neat way to install linux in Windows System

After I had decided to install Linux on my Windows system, I had few options like Debian and Ubuntu. When I tried Debian, the one I had got from "Linux for you" Magazine, I suffered a lot to make it work along with Windows. I had to allocate a different partition for Debian OS and take care of dual boot, etc. Even though I worked in Linux in my college days, I found a lot of hurdles in this process. After that, I had come across the Wubi installer for Ubuntu which makes installation along with existing Windows OS seamless. Believe me, I didn't find any difficulty with installation. It installs in Windows machine as if it were another windows application. It configures itself with boot options. No need to worry about any configurations at all.

Installer can be downloaded from http://www.ubuntu.com/download/ubuntu/windows-installer.

Even though Wubi claims this particular way of installation may lead to disk access performance hurt, I couldn't find any difficulty at all. This is far better than having linux running in Virtual machine inside Windows OS, an another way to seamlessly work with Windows. This version of Ubuntu is having amazing UI, I would say, comparable with MAC. UI is really intuitive. I just tried IDLE for Python. I would dare say, Linux version of IDLE is far superior than Windows'. I feel like programming a lot now. I will come with some more findings on this.

Thanks a lot for reading.

Thursday, June 30, 2011

Random Shuffling - intuitive Fisher Yates algorithm.

Today we will discuss an algorithm for random shuffling.

Random shuffling of a ordered set of numbers would end up in a different order. Mathematically, result is a different combination. Also, Random shuffle needs the algorithm output all possible combinations with equal probability. Say 1, 2, 3 is our input, then probability of getting the result 3, 2, 1 should be tantamount of getting 2, 3, 1. So, getting a specific output for the given input of cardinality 'n' should have the probability 1 / (n!).

The Fisher-Yates algorithm for the above problem is very elegant. Idea is to select the first number out of "n" possibilities and next out of "n-1" possibilities and so on. So, we will get an output which has the probability of occurring, 1 / n!.

Please find the code-snippet, below for the same.

for( int i = Length - 1; i > 0; i--)
{
int currentRandomIndex = GetNextRandomNumber(0, i);
Swap( inArray[i], inArray[currentRandomIndex];
}

If you write the algorithm for this on your own, there are possibilities to make subtle mistakes like the one in this link, http://www.codinghorror.com/blog/2007/12/the-danger-of-naivete.html

Another interesting variant is collecting "n" random samples out of huge set of cardinality "N". I have discussed the same in the blog,




Friday, June 24, 2011

Divide and Conquer - Intuitive understanding of Master theorem

Today we will discuss about Master Theorem, in an intuitive way.

Divide and Conquer is a way to solve the problem by dividing the problem space recursively and adding up the solutions to get the complete solution. Merge sort uses this approach to solve the problem of Sorting. Given a problem, how can we make sure divide and conquer would help to solve that. For instance, finding a minimum in a set of numbers of cardinality n, can be solved in O(n) using naive approach. Can we use divide and conquer approach to solve this problem? Ok, that won't be efficient, in fact make your program run slow rather than naive approach in practical. Which makes divide and conquer not usable for some problems. Answer lies in Master theorem which makes asymptotic analysis of certain recurrence relations easy. Let us formally analyse the complexity of Merge sort and Finding minimum.

Merge Sort:
T(n) = 2 T(n / 2) + Ó¨(n), Ó¨(n) is the complexity of merging results from divided problem spaces, in this case problem space is divided by two recursively.

Let us try to expand T(n),

T(n) = 2 T(n/2) + n, Ó¨ has been eliminated for ease of understanding.
= 4 T(n/4) +2 * n / 2 + n
= 8 T(n/8) + 4 * n/4 + 2 * n/2 + n
....
= n log 2 2 T(n/ n log 2 2) + ~nlog 2 n
= n + nlog 2 n, T(1) is assumed as 1 here.
Finally, T(n) is Ó¨(n log 2 n), since n * log 2 n dominates.

In case of our contrived example of finding minimum using divide and conquer would give
T(n) = n + 1 * log 2 n
So, it would be Ó¨(n) since n dominates.

Let us define the general recurrence relation, T(n) = a T(n / b) + f(n)
Intuitively, T(n) would be the dominating one among n log b a and f(n) as per the discussion above.

Master theorem formulates the above general problem, in three cases as given below.

Case 1:

If f(n) = O( n log b a-e) for some constant e > 0, then T (n) = Θ( n log b a).
Intuitively n log b a dominates f(n), hence T(n).

Case 2:

If f(n) = Θ( n log b a log k n) with k ≥ 0, then T (n) = Θ(n log b a log k+1 n),

You could correlate the results from above merge sort analysis with this.

Case 3:

If f(n) = Ω( n log b a+e) for some constant e > 0, then T (n) = Θ( f(n) ). It needs a condition that, a * f(n/b) <= c f(n).

This is again, an inverse condition of Case 1. Important thing to note is the condition. If the condition is not satisfied, f(n) might be f(n) log b n.

So, the bottom line is Complexity analysis of Divide and Conquer can be easily done with Master theorem. As an exercise, you could try analysing Binary search algorithm using Master theorem method.

One important thing to note is Master theorem applies, if and only if f(n) is polynomial.

Thanks for reading. Suggestions and Questions are welcome.

Wednesday, June 22, 2011

Segment Tree.

Today we will discuss about a data-structure, segment tree which again deals with Range Query and Update as Binary Indexed Tree.

Range Minimum Query(RMQ) is a classic computer science problem, which asks for minimum value, repeatedly in the given ranges. Let us assume an array { 2, 4, 6, 7, 5, 3, 1, 8 }. And we would be getting queries like, minimum in the range 4 - 5, 0 - 3, etc. Here the minimum operation can be anything like maximum, sum, etc.

One easier way to solve this problem(RMQ) is to pre-process information. In case of minimum, pre-processed info can be an array such that value at an index corresponds to minimum value encountered until that index in input array.

ProcessedArray[i] = min{ inputArray[0], inputArray[1] ... inputArray[i] }. Better dynamic programming approach would do pre-processing in O(n). But the problem is updating a value in inputArray would make processed array, completely invalid in the worst case. So, updates would be very costly.

Naive approach is to check all the elements in the range for minimum value and complexity in this case would be O(n) in the worst case, which is un-affordable in case of huge number of queries.

Segment trees would be a better candidate for this problem as it stores minimum values of intermediate ranges in tree nodes and original values in tree leaf nodes. This may lead to a space complexity of twice the number of elements in the original input approximately. But complexity of query reduces to log N which is promising. Segment tree is balanced and complete binary tree. It could be considered as a heap as it has characteristics of completeness and parent node's value is a function of child nodes, in RMQ it is minimum function. So, it could be stored in Array, as we do in Binary Heap.



Each node stores the minimum value of the given range. Range is depicted inside the circle. For instance [0, 9] means, it stores the minimum value of the range 0 - 9. Now query for a range could easily be done in log N times. Let us say, if we need to Query the range 0 - 6, it could be divided into 0 - 4 and 5 - 6, which could be got from tree nodes easily. I am adding up source code written in C++ for the same below. Questions are welcome.

Code below, is for Demo Purpose, not of production quality.

Thanks for Reading.

double Log2(double in)
{
return log(in)/ log(double(2));
}

int NearestPowerOfTwo(int in)
{
return pow(2, ceil(Log2(in)));
}

class SegmentTree
{
public:
SegmentTree()
{
treeArray = NULL;
}
~SegmentTree()
{
delete [] treeArray; treeArray = NULL;
}
void CreateTree(int *inArray, int length);
void InitializeTree(int internalIndex, int *inArray, int start, int end);
int QueryTree(int internalIndex, int start, int end, int rangeStart, int rangeEnd)
{
// Pre-condition check.
if( rangeStart > end || rangeEnd < start)
return -1;

// Check the present range overlap
if( start >= rangeStart && end <= rangeEnd)
return treeArray[internalIndex];
int left = QueryTree(2 * internalIndex + 1, start, (start + end) / 2, rangeStart, rangeEnd);
int right = QueryTree(2 * internalIndex + 2, (start + end) / 2 + 1, end, rangeStart, rangeEnd);

if( left == -1)
return right;
else if( right == -1)
return left;
else
return (left < right) ? left : right;
}

void UpdateTree(int index, int value);

private:
int *treeArray;
int treeLength;
};

void SegmentTree::CreateTree(int *inArray, int length)
{
// Create Tree here...

treeLength = 2 * NearestPowerOfTwo(length) - 1;
treeArray = new int(treeLength);

InitializeTree(0, inArray, 0, length - 1);
}

void SegmentTree::InitializeTree(int internalIndex, int *inArray, int start, int end)
{
// Base condition.
if( start >= end)
{
treeArray[internalIndex] = inArray[start];
return;
}
InitializeTree(internalIndex * 2 + 1, inArray, start, (start + end) / 2);
InitializeTree(internalIndex * 2 + 2, inArray, (start + end) / 2 + 1, end);

if( treeArray[internalIndex * 2 + 1] < treeArray[internalIndex * 2 + 2])
treeArray[internalIndex] = treeArray[internalIndex * 2 + 1];
else
treeArray[internalIndex] = treeArray[internalIndex * 2 + 2];
}

void SegmentTree::UpdateTree(int index, int value)
{
int indexToModify = treeLength / 2 + index;
treeArray[indexToModify] = value;

int indexToCorrect = (indexToModify - 1) / 2;

while(indexToCorrect >= 0)
{
if( treeArray[indexToCorrect * 2 + 1] < treeArray[indexToCorrect * 2 + 2])
treeArray[indexToCorrect] = treeArray[indexToCorrect * 2 + 1];
else
treeArray[indexToCorrect] = treeArray[indexToCorrect * 2 + 2];
if( indexToCorrect == 0) break;
indexToCorrect = (indexToCorrect - 1) / 2;
}
}

Handy function for Log Base 2 and nearest power of two in C++

Since there is no direct way to find Logarithm Base 2 of a number in C++, the given function in C++, would come handy in several situations.

// Essential Header file - math.h

// Log2 Function

double Log2(double in)
{
return log(in)/ log(double(2));
}

Let us use the above function for calculating nearest power of two easily, as given below.

int NearestPowerOfTwo(int in)
{
return pow(2, ceil(Log2(in)));
}

Monday, January 31, 2011

Binary Indexed Tree.

Let us discuss an interesting problem today related with Cumulative frequency. Consider a hotel database which stores timings and number of phone-calls arrived during the interval.

Time => Number of calls

10 - 12 => 30

12 - 14 => 5

14 - 16 => 12

16 - 18 => 25

...

In this example, cumulative sum of calls made before 18 means Number of calls made in the interval 16-18 + (14-16) ... (10-12). It could be considered as a Range-sum. Understanding of this problem is very essential to realize the need for Binary indexed Tree.

Let us assume that this information had been stored in an array, which has interval mapped as index, and number of calls as values. Mapping from time interval to array index is not very important for this discussion.

Now the problem is to find a suitable data structure which could make updates, insertions and queries on this information in an efficient way. Let us think of naive solutions first.

Recursion Approach:

If we have to find out number of calls made before 18, then we will have to add up the values prior to that interval.

Mathematically, CumulativeSumTill(n) = valueAt(n) + CumulativeSumTill(n-1), a recursive relation. We know this could be implemented with recursion, with complexity O(n), as it needs traversal of all elements occur before the given element. Finding cumulative sum at an index is not efficient in this way.

If we have to add up a new interval information, we could easily add in source array in constant time as addition would generally happen at the end of the array. So with updating any interval's info also. So, updates happen at constant time, very efficient.

Iterative or Dynamic Programming Approach:

With Dynamic programming, iterative approach, if we could memoize the intermediate results in a separate table, we could make complexity of Find or Query, O(1), a constant. In the below given code, we had memoized the cumulative sum results in a separate table. So, getting a CS at an index is very efficient. But think of an update at an index '0', which would render all the stored results invalid. Every updates would be of complexity O(n), very inefficient.

void MemoizeCumulativeSum()
{
CumulativeSum[0] = SourceArray[0];

for( int i = 1; i < lengthOfSource; i++)
{
CumulativeSum[i] = CumulativeSum[i-1] + SourceArray[i];
}
}

Summary of Naive Solution:
Dynamic programming approach or storing intermediate results approach makes query fast, but updates slow. On the fly calculation, recursion based approach, makes updates fast, but query slow. This is a perfect instance of Space-Time trade-off. So, in order to have both query and update happen in an efficient way, we need to think of a different data-structure which combines above-said two approaches in a perfect way.

Binary Indexed Tree:
We are going to discuss a data structure which helps to query and update in logarithmic time.

Binary indexed tree or Fenwick tree, could be realized using an array which holds cumulative sum or sub-cumulative sum as its values. Every index of this array would hold partial cumulative sum or complete cumulative sum according to the BIT algorithm. This array makes query and update happen in logarithm time as described below.

Basics:

Any number could be expressed in 2's powers. So, number of digits of any number in 2's power form would be log n of base 2. Presto, this is what exactly the property being made use in BIT.

Initially let us assume an auxiliary table as in iterative approach, contains all zeroes. While updating some value in source array, we need to make sure all the indices in the auxiliary table which contains partial or cumulative sum with that value, updated. This may look like naive iterative approach; but the beauty of this data-structure is that it doesn't need to update all the values as in naive memoization. So, this could be considered as an educated or intelligent memoization. On the average, it may need log n updates in auxiliary table which makes update efficient. Same way, query needs sum of certain values in auxiliary table unlike naive recursion approach, which needs to process all the values in source array. BIT needs to process only log n values for a query. So, query and updates are efficient in this data-structure.

BIT Core Algorithm:

A complete implementation of BIT which supports update and query, has been given at the end of this blog.

While the start of the algorithm, prepare an auxiliary array of source array's size and initialize all the elements with zero. Now loop through all the elements in the source array and update the auxiliary table according to below code. Below algorithm calculates the indices to be updated, as specified by the logic, nextIndex += (nextIndex & -nextIndex). ANDing a value and its negated isolates a non-zero-least significant bit. If an index is "xxx1000" then, index & ~index would result in "0001000". Now the new index is "xxx1000" + "0001000" would unset the least significant non-zero bit in "xxx1000". Hence this algorithm, in the worst case would update log2 n indices in the auxiliary table.


Below algorithm would be used to reflect any modifications made in source array; if value in some index had been incremented in the source array, this must be called in order to keep the BIT updated. The same argument with regards to complexity of initialization holds good for updates also. If you couldn't get a hold of algorithm, no worries. We will discuss the algorithm intuitively later.


Same way, query algorithm is as follows:



Intuitive Analysis:

Actually, after all updates, BIT or auxiliary table will contain either partial or complete cumulative frequency depends on the index.

Indices which are powers of two contain complete cumulative frequency of that index. Others would contain partial.

'6' can be written as sum of powers of two as 4 + 2. Now take the value at 4, value at index 2 starting from 4 and add up to get Cumulative Frequency of 6.

Same way '7' can be written as 4 + 2 + 1. Take value at 4, value at index 4+2 and value at index 4+2+1 and add up those. Intuitively, this is what exactly happens in Query algorithm.

Same could be applied for Update algorithm also. Only complexity of this algorithm is to understand the procedure. :)

BIT Class implementation:

class BinaryIndexedTree
{
public:
BinaryIndexedTree(vector<int> inputArray)
{
internalArray = NULL;
InitializeBIT(inputArray);
}
void IncrementValue(int value, int index)
{
int indexToModify = index + 1;

while(indexToModify < arraySize)
{
internalArray[indexToModify] += value;
indexToModify += (indexToModify & -indexToModify);
}
}
int Query(int index)
{
int indexToRetrieve = index + 1;
int retValue = 0;
while(indexToRetrieve)
{
retValue += internalArray[indexToRetrieve];
indexToRetrieve -= (indexToRetrieve & -indexToRetrieve);
}
return retValue;
}
private:
void InitializeBIT(vector<int> inputArray)
{
// Initialize internal array;
// Zeroth index is Sentinel.
arraySize = inputArray.size() + 1;
internalArray = new int[arraySize];
for(int i = 0; i < arraySize; i++)
internalArray[i] = 0;
for(int i = 1; i < arraySize; i++)
{
int valueToBeAdded = inputArray[i - 1];
internalArray[i] += valueToBeAdded;
int k = i;
while( k < arraySize)
{
k += (k & -k);
internalArray[k] += valueToBeAdded;
}
}
}
int *internalArray;
int arraySize;
};

Some more links for this topic



Thanks for Reading.

Sunday, January 30, 2011

Random Samples from a huge set - Intuitive reservoir sampling

Consider a set of cardinality N. We are given a problem of collecting 'n' random samples from this set. This can be easily done with standard random number generators.

Please review my another blog, if you are not comfortable with Fisher-Yates algorithm for Random shuffling. http://karthikpresumes.blogspot.com/2011/06/random-shuffling-intuitive-fisher-yates.html

Now a bit more complex sampling. What if the number 'N' is not known a priori. So, we couldn't generate random numbers, as we don't know the range.
Interestingly, what if the parent set is keep growing; N is time-varying.

Two Pass algorithm:

Let us consider a situation where 'N' is not varying, but not known a priori. In this case, N could be calculated with a single pass of the set. After this pass, problem boils down to initial one, we discussed. Cool, but number of passes needed is two.

Single Pass Reservoir Sampling algorithm:

Is there any way to make it one pass? Yes. Reservoir sampling.


In this post, I will try to explain this algorithm in an intuitive way.

Base case:

Let us assume the samples set as reservoir of 'n' numbers. If n is equal to N, problem is base case, which is trivial. Reservoir in this case would have all the samples, perhaps in a different order.

Simple case:

Now, assume N = n+1, we need n samples out of n + 1.

Initially fill the reservoir with first n elements. After processing element at index n+1, reservoir's elements should be having the probability n / (n + 1) of being placed in reservoir. It means, n + 1th element would be placed with probability n / (n+1) . This could be easily achieved by generating a random number between 1 .. n + 1. If generated one is above n, then place the element in reservoir.

Placing an element in filled reservoir, needs some element else gets removed. Please remember, after all, probability of elements being in Reservoir should be n /N.

Probability of an element, x in reservoir to get removed is selecting n + 1 th element for being in Reservoir and selecting the element x for replacing; which is n / (n + 1) * 1 / n => 1 / (n + 1). So, probability of staying of x is n / (n + 1), 1 - 1 /(n + 1), which is the required probability. This could be easily generalized for any index above n.

Generalization:

For general index i, probability of being placed in Reservoir is n / i. Probability of just removing a random element from Reservoir is 1 / n. So, combined probability of a random element to get replaced is 1 / i. Probability of the same element to stay in Reservoir is (i - 1) / i, which was there in Reservoir with Probability n / (i - 1). So, combined probability is n / i.

Luckily, this will work for time-varying 'n' also.

Thanks for reading.

Thursday, January 27, 2011

Dealing with Really big numbers.

Yesterday evening, I was trying to solve a problem which deals with multiplying really huge integers. As we know, in most of the programming languages integer data type has limit on the size of integer. Generally, it depends on the processor word size. In 32 bit machines, it would be 32 bits.
So, biggest unsigned integer that could be held in, is 4294967296, just 10 digits. What if we have to deal with 100 digit numbers???

Problem: Find the sum of natural numbers till N, N will be really huge, say 80 digits.

Obviously, we couldn't work with basic data type Integer. We can implement our own Abstract Data Type for this. Still, choosing perfect algorithm for processing is difficult.

There are some languages, which do support big integers naturally, like Haskell, Pike. Some languages like Java, supports big integers using different data type rather than usual Integer type. Haskell's limit for integer depends only on Main memory. Pike also supports integers of huge size.

I evaluated Pike yesterday. Certain characteristics of Pike are really interesting. Syntax in Pike is almost equivalent to C /C++. It is garbage collected and Object oriented language. I am just showing a Pike program for the above problem here.

int main()
{
string countStr; int count;
countStr = Stdio.stdin->gets();
sscanf(countStr, "%d", count);
while( count--)
{
int realInt;
string strInt;
strInt = Stdio.stdin->gets();
sscanf(strInt, "%d", realInt);
int res = realInt * (realInt + 1) / 2;
string outStr = sprintf("%d\n", res);
write(outStr);
}

return 0;

}

You could find there is no much difference in the syntax, if you are a C/ C++ programmer. It is very fast in handling really big integers. I tested and played with some other library functions in Pike. Hope you will also try and play with this innovative language.

In next blog, I will explain my experiences with Haskell.

Thanks for reading.