Monday, October 24, 2011

Visitor Pattern in C++ - Intuitive understanding.

Today we will discuss Visitor pattern in an intuitive way as on the face of it, it would seem pretty intricate. First off, we need to understand the scenarios, in which Visitor would be of great helpful. Let me start with basic examples and move on to complex details of Visitor pattern.

When we design any container, our focus should be on data-structure which holds the data but not on algorithms which work on contained data. For instance, STL containers had been designed in such a way that algorithms can be developed independent of data containers. STL achieves this by the concept of iterators and by the uniformity of type of contained elements. Every container that supports iterators, can be used seamlessly in algorithms.

What if the contained elements are of disparate types? Now iterators wouldn't help this situation as types are varying. More importantly, if the contained elements need to be treated differently for different types, even deriving from a same base class wouldn't be much helpful. Consider an example of Shape objects like rectangle and Ellipse. Even though both can be described by Bounding rectangles, calculation of area would differ. So, a common algorithm for area is not possible, even though both are Shape objects. This is one of the scenarios in which Visitor can help.

In a nutshell, Visitor pattern can be used to add a new algorithm to work on a container without modifying the container itself. Now, let us translate the above line in OOP way.

Visitor is a way to add a new behavior to the existing class without modifying the class itself.

Perhaps, this is one more to way to achieve Open-Closed principle of OOP, Let us explore.

Let us implement some trivial examples to proceed further. Implementations are not of production quality; proper memory management is not there. This is just to explain the concepts.

Let us define an interface, IShape from which Rectangle and Ellipse are derived as given below.

struct Bounds
{
Bounds() : left(0), right(0), top(0), bottom(0)
{

}
Bounds(int ileft, int itop, int iright, int ibottom) : left(ileft), top(itop), right(iright), bottom(ibottom) { }
int left;
int right;
int top;
int bottom;
};

class IShape
{
public:
virtual Bounds GetBounds() = 0;
virtual void Draw() = 0;
};

class Rectangle : public IShape
{
public:
Rectangle(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Rectangle here.
}
private:
Bounds currentBounds;
};

class Ellipse : public IShape
{
public:
Ellipse(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Ellipse here.
}
private:
Bounds currentBounds;
};

Let us define a Graphics Designer, a composite class for Shapes. At Run-time several shape objects could be added to this toy Graphics Designer and Draw can be invoked to draw all the elements.

class GraphicsDesigner : IShape
{
public:
void AddRectangle(Bounds inBounds)
{
currentElements.push_back(new Rectangle(inBounds));
}

void AddEllipse(Bounds inBounds)
{
currentElements.push_back(new Ellipse(inBounds));
}

void Draw()
{
// Enumerate through the currentElements and Draw each.
}

private:
vector<IShape*> currentElements;
};

Given a class like GraphicsDesigner, How to add a new algorithm to work on contained elements! For instance, we need to find the Minimal rectangle which covers all the elements. In order to achieve this, we would need to add a method in container which enumerates through the elements and execute the algorithm. This violates OOP. And every time, we need to implement a new algorithm, we would end up adding new methods into the container class.

Algorithms are getting bound to the data container.

We will analyse further to understand the problem well.

Adding methods to the container is not a good idea as algorithms just depend upon the contained elements and not on the state of the container.

Needed algorithms can't be defined before hand, just think of possible algorithms on Integers containers, finitely huge, right.

Given that, we need to find out a way to dynamically add the behaviors on to the containers. Without further ado, we will define Visitor pattern and see how it resolves this.

For every algorithm, we should derive a class from Visitor interface; Visitor interface should contain a collection of overloaded Visit methods one for each different types of Visit-able classes. Visit-able classes as per our example, are Rectangle and Ellipse(and general Shape also). Every visit-able class should define a method "Accept".

Let me define a simple PrintVisitor to clarify the needs of all these interfaces.

Before that, given below are the example classes after the changes, made with Visit and Accept interfaces.

class Rectangle;
class Ellipse;

class IShapesVisitor
{
public:
virtual void Visit(Rectangle*) = 0;
virtual void Visit(Ellipse*) = 0;
};

class IShape
{
public:
virtual Bounds GetBounds() = 0;
virtual void Draw() = 0;
virtual void Accept(IShapesVisitor* visitor) = 0;
};

class Rectangle : public IShape
{
public:
Rectangle(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Rectangle here.
}
void Accept(IShapesVisitor* visitor)
{
visitor->Visit(this);
}
private:
Bounds currentBounds;
};

class Ellipse : public IShape
{
public:
Ellipse(Bounds bounds) : currentBounds(bounds)
{

}
Bounds GetBounds()
{
return currentBounds;
}
void Draw()
{
// Draw Ellipse here.
}
void Accept(IShapesVisitor* visitor)
{
visitor->Visit(this);
}
private:
Bounds currentBounds;
};

class GraphicsDesigner : IShape
{
public:
void AddRectangle(Bounds inBounds)
{
currentElements.push_back(new Rectangle(inBounds));
}

void AddEllipse(Bounds inBounds)
{
currentElements.push_back(new Ellipse(inBounds));
}

void Draw()
{
// Enumerate through the currentElements and Draw each.
}

Bounds GetBounds() { return Bounds(); }

void Accept(IShapesVisitor* visitor)
{
vector::iterator shapeItr = currentElements.begin();
for(; shapeItr != currentElements.end(); shapeItr++)
{
IShape *val = (*shapeItr);
val->Accept(visitor);
}
}

private:
vector<IShape*> currentElements;
};

Now, we will define a simple PrintVisitor to show the power of Visitor pattern.

class PrintVisitor : public IShapesVisitor
{
public:
void Visit(Rectangle* inShape)
{
cout << "This is a Rectangle" << endl;
}

void Visit(Ellipse* inShape)
{
cout << "This is an Ellipse" << endl;
}
};

Of course, "Print" is so trivial to be considered as an algorithm which works on the data. But it clearly avoided the need for having "Print" virtual function in Shape derived classes. Now let us think of a decent algorithm which could work on the data. Let us assume, we need to find the total area occupied by all shapes in GraphicsDesigner. Without Visitor pattern, we would have to add up a new member function in GraphicsDesigner class. But with Visitor pattern, things become very easy that we need to add a new class in Visitor hierarchy as given below.

class TotalAreaVisitor : public IShapesVisitor
{
public:
TotalAreaVisitor() : TotalArea(0.0)
{
}
void Visit(Rectangle* inShape)
{
Bounds bounds = inShape->GetBounds();
int width = bounds.right - bounds.left;
int height = bounds.bottom - bounds.top;
int currentArea = width * height;
TotalArea += currentArea;
}

void Visit(Ellipse* inShape)
{
Bounds bounds = inShape->GetBounds();
int width = bounds.right - bounds.left;
int height = bounds.bottom - bounds.top;
double currentArea = width * height * 3.14 / 4;
TotalArea += currentArea;
}
double GetTotalArea()
{
return TotalArea;
}
private:
double TotalArea;
};

I hope now the need for Visitor pattern is very clear. For every new algorithm, instead of adding a new method in container class, we can add a new visitor. This helps us to achieve one of the SOLID principles, Open-Closed principle. There are disadvantages also like a new Shape derived can't be easily added as it needs modification of all possible Visitors.

So, Visitor pattern trades off the ease of adding new behaviours into Visit-able hierarchy with not easily allowing to create a new derived class in Visit-able hierarchy.

Thanks for Reading.

Friday, October 21, 2011

Bloom filter implementation in C++ - NikBloomFilter

Bloom filter can be considered as a succinct representation of a set of elements. Set of elements might be stored in files or in Database tables, etc. And a set may support operations like insertion, deletion and retrieving elements.

For this discussion, Let us assume a dictionary of passwords which is a set of huge number of strings which is stored in several files according to their starting character like A-List, B-List,etc. We assume that we don't keep all this information in Main memory. So, in order to find a string's presence in the dictionary, Program has to fetch the right file according to the starting character and keep it in main memory for a Binary search. Generally, we don't require frequent insertions and deletions from this set, but queries for the presence of an element. Only way to make the queries fast is to cache complete set of elements in Main memory. But there are some scenarios in which caching is unaffordable.

Bloom filter, as I said early, is a succinct data-structure which registers the presence of elements in a set. While we insert an element in a set, Bloom filter must be updated. Bloom filter is basically a bit array. A small subset of bits in that array would correspond to an element in the set. So, when an element is inserted, corresponding bits should be set. Now query is simply a check for the corresponding bits state; if all the bits corresponding to that element are set, then query would return true. But this true may not be a real true. Since every element maps to a subset of bits in the bit array, overlapping may occur and which could cause a false positive. Before going further, we will analyse this structure mathematically.

Bloom filter is a probabilistic data structure. When an element is inserted, it has to set certain bits in the bits array. In order to find out what are all the bits to be set, we need to hash the content of an element. Now, hashed elements could be used to construct the indices of bits to be set. For instance, MD5 of a byte array would result in 16 bytes which could be used to construct 4 indices. We will get into implementation details later in this discussion. As we discussed early, this data structure may give false positive. Luckily, we could be able to control this by trading off space.

Let us get into the analysis. Let us assume the size of the bloom filter, "M" bits and "N" elements had been inserted into the set, hence in Bloom filter also.

Now, probability of a bit B[i] to be unset or zero is (1 - 1/M) ** kN. Here ** represents "power of" operator. k is the number of bits to be set for an element in a set. The above formula could be easily derived as follows: Since N elements are inserted, setting some bit in the bloom filter would have happened for kN times. One particular bit not set in the first time is M-1 / M, for two consecutive times, is (M-1 / M) * (M-1 / M) and so on. Underlying assumption in this, is setting bits are statistically independent. Asymptotically, this is true. So, we will assume so in this discussion.

P{ B[i] == 0] } is (1 - 1/M) ** kN, approximately e ** (-kN/M).

False positive is a condition in which bloom filter tells a string present in the set which is not actually present. Cause of this condition is the bits corresponding to the string set by other independent strings and due to cumulative effect of that, as described earlier in this discussion. Probability of false positive is all the bits corresponding to the string are set which is,

(1 - e ** (-kN / M) ) ** k => P{ False Positive }
Since the false positive probability depends on the factors k, N, M, we need to minimize the function ( taking derivative and equating to zero) with respect to 'k'. Analysis shows this function becomes minimum if 'k' becomes (M / N) * ln 2. ln 2 is natural logarithm of 2.

If we assign 'k', the value (M/N) * ln2 in the False positive probability function, we will get,

P{ False Probability } = (1 - e ** -(ln 2) ) ** k = (1/2) ** k or (0.6185) ** (M/N).

The above result is the lynch-pin of the Bloom filter implementation, as it gives the way to control False positives. For instance, if we wish to have F.P of 0.2 then M/N must be around 8. "N" is the number of elements in the set. Given that, "N", bit-set size of the Bloom filter, could be easily calculated.

An implementation of Bloom filter in C++ has been done and checked in to the Git-hub.

This implementation doesn't have support for Deletion and merging with some other Bloom filter as it is the initial release. Any comments on this is welcome.

Thanks for Reading.

Saturday, October 15, 2011

Prototype Pattern in C++, Dynamic instantiation.

Before starting the discussion of Prototype pattern, We will understand the problem of Dynamic instantiation.

Given a typename ( a class name ) in C++ as a string, how an object can be created of that type?

First off, there is no language level support for dynamic instantiation in C++ unlike Java and C#. Since in C++, there is no common base class for all classes, dynamic instantiation support is generally not possible. MFC, a C++ framework provides support for Dynamic instantiation as all classed are derived from CObject. Let us try in the same line, making use of common base and Prototype pattern to achieve Dynamic instantiation.

Prototype pattern imposes a class to have a method, Clone, whose sole purpose is to generate a new object of same type. With a clone-able object, we could be able to generate a new object of same type. In order to achieve this, all classes must be derived from a common base class. And in that common base class, we can have a pure virtual member function(interface), Clone. So, all classes derived from that, would be clone-able.

Let us call that common base class "Object" as in Java. My minimalistic implementation of the Object is as given here,

class Object
{
public:
virtual ~Object() {}
virtual Object* Clone() const = 0;
static Object *MakeObject(std::string type);
static void AddNewClassInfo(std::string type, Object* in);
static std::map objectsTable;
};

Clone is the key method of this abstract class. Implementation of the same could be like the one given below.

class Derived : public Object
{
Object *Clone() { new Derived(); }
};

So, all the classes derived from Object can be cloned using this method call. In order to "Clone" an object, we should have one initial object, prototypical instance. And we should be able to get that base prototypical instance from the type name. If we could be able to do this, then Making an object using type-name alone is a cake-walk. "objectsTable" map in Object class holds the aforesaid mapping; a map of type-name and a prototypical instance.

Adding a prototypical instance can be done through the method, "AddNewClassInfo", whose implementation can be like this.

// Populate map with type-name and its corresponding prototype.
void Object::AddNewClassInfo(std::string type, Object *in)
{
objectsTable[type] = in;
}

Now the implementation of "MakeObject" must be easily understandable.

Object* Object::MakeObject(std::string type)
{
std::map::iterator itr;
itr = objectsTable.find(type);
if( itr != objectsTable.end())
{
return itr->second->Clone(); // Clone the prototypical instance.
}
else
return NULL;
}

All the essential implementations of "Object" is done. Since it has a static member, it should be defined in CPP file as given here,

std::map Object::objectsTable;

OK. Now let us create a derived class from Object and test this implementation.

#include "Object.h"

class DynamicInstantiable : public Object
{
public:
DynamicInstantiable() {}
void SayHello()
{
std::cout << "Hello! " << std::endl;
}

Object* Clone() const
{
return new DynamicInstantiable();
}
};

The above class has the implementation for Clone method and is derived from Object. Now, this type must be added in Static Map in "Object" class using Object::AddNewClassInfo. We have several options here. But I would like to keep things simple. So, I have added a new header file which has a global function "InitializeDynamicObjects". And this method must be called in "main()" function in the very first line itself.

A sample implementation is given below.

void InitializeDynamicObjects()
{
Object::AddNewClassInfo("DynamicInstantiable", new DynamicInstantiable());
}

Here is the sample Main function, I have written to test the code.

#include "DynamicInstantiable.h"
#include "ObjectInitializer.h"
using namespace std;

int main()
{
InitializeDynamicObjects();

DynamicInstantiable* newInst = dynamic_cast(Object::MakeObject("DynamicInstantiable"));

if( newInst != NULL)
{
newInst->SayHello();
}

cout << "Hello World" << endl;
}


I hope this post explains Prototype clearly. Comments and Queries are welcome.

Thanks for Reading.

References:

Friday, October 14, 2011

Curiously recurring template Pattern, CRTP - Static Polymorphism in C++

Let us start this discussion with some minimalistic C++ classes, which doesn't convey any meaning but serves the purpose of discussion.

class Base
{
public:
void PrintMe()
{
std::cout << "Print: Base" << std::endl;
}
};

class Derived : public Base
{
public:
void PrintMe()
{
std::cout << "Print: Derived" << std::endl;
}
};

void TestPolymorphism()
{
Base *ptrBase = new Derived();
ptrBase->PrintMe(); // This would print "Print: Base"
}

In the above code, ptrBase is initialized with an object of Derived. Ideally, PrintMe should have printed "Print: Derived". But it would print "Print: Base". Any C++ programmer could identify the issue; early binding according to the type. It means when we say, ptrBase->PrintMe(), compiler would check whether the function called is virtual or not. If it is not virtual, it would bound this call to the address of function defined in the calling type; In this case calling type is Base.

Dynamic binding can be achieved using "Virtual" specifier, in method declaration. When we mark a function with Virtual, compiler wouldn't make early binding; it would resolve the same using V-Table. Every class which has a virtual method, would have a table of function pointers. And a hidden pointer for the table, will be inserted into the class and will be initialized during construction of objects of the class. So, during a real method invocation on objects, two things would happen. Getting the right function address from V-Table and calling the function pointed by the same. Even though this extra indirection doesn't cause much performance degradation, in several scenarios this could be easily avoided. Please find the snippet below, after adding virtual specifiers.

class Base
{
public:
virtual void PrintMe()
{
std::cout << "Print: Base" << std::endl;
}
};

class Derived : public Base
{
public:
virtual void PrintMe()
{
std::cout << "Print: Derived" << std::endl;
}
};

void TestPolymorphism()
{
Base *ptrBase = new Derived();
ptrBase->PrintMe(); // This would print "Print: Base"
}

As we discussed early, even though dynamic polymorphism doesn't cause much performance overhead, it would make a function not "in-line"able. Sometimes, in-lining a simple function would improve performance especially if it is being called several times in code. Since normal function call breaks code execution flow, CPU level, caching like optimizations, are not possible. With static polymorphism, we could achieve the in-lining capability. Now let us get into the topic of Static polymorphism.

Let us try to make Base class function a bit intelligent.

In Base class, if we could cast the "this" pointer's type to the right Derived class, we could be able to solve the issue of Dynamic binding; Means avoiding Virtual specifier.

The below code snippet won't compile, but it gives the idea of what to do.

class Base
{
public:
void PrintMe()
{
static_cast<Derived*>(this)->PrintMe(); // This would call "Print: Derived"
}
};

We had successfully removed Virtual Keyword; So, V-Table wouldn't be created and function can be in-lined. There are some problems; First off, it wouldn't compile, as compiler doesn't know the Derived class yet. Secondly, this is specialized for only one derived class. This specialization can be removed with templates. Yes. that is the whole idea.

template<Derived>
class Base
{
public:
void PrintMe()
{
static_cast<Derived*>(this)->PrintMe(); // This would call "Print: Derived"
}
};

Now, you could derive classes from the above class like this,

class Derived : public Base<Derived>{ };

Deriving from template classes, specialized with the same Derived class type, is called as CRTP, curiously recurring template pattern. Even though the above example is very minimalistic, there are several uses for this pattern. One of the great examples, as given by Wikipedia is Counter base class. In order to get the statistics of objects of a particular type, we can implement a Counter class as below.

template<T>
class Counter
{
public:
static int GetTotalObjectsCreated()
{
return nObjectsCreated;
}
protected:
Counter() { nObjectsCreated++;}
~Counter() { nObjectsCreated--; }
private:
static int nObjectsCreated;
};

template<T>
int Counter<T>::nObjectsCreated = 0;

Let us assume we need to get the number of objects created for a particular type, Test. Derive Test from Counter, specialized with Test itself.

Class Test : private Counter<Test>{ };
// Private derivation is to show that it is not 'is a' relationship.

Now at any point of time, in program execution, we could get the number of objects alive using the call like this.

int nObjects = Counter<Test>::GetTotalObjectsCreated();

CRTP is cool and positive side effect of Code replication based generics mechanism, unlike Java and .NET.



Thursday, October 6, 2011

Doors Toggling Problem.

Today we will discuss a puzzle which is very interesting as its end result is mathematically elegant and intuitive.

There are 100 doors initially kept in closed state. One has to go through 100 iterations and in each iteration, he has to toggle a certain set of doors given by the rule below:


In 'i'th numbered iteration, he has to toggle the doors i, 2i, 3i, 4i, .. n * i. where n * i <= 100 < (n + 1) * i. For instance, in 3rd iteration, doors to be toggled are 3, 6, 9 ... 99.

By toggling, it means if the door is closed it has to be opened and if it is opened, it has to be closed.

Given that, question is after 100 iterations, which are all the doors would be in open state. Please stop reading if you want to try this puzzle on your own.

First off, door would be in OPEN state, if it had been toggled for odd number of times otherwise it would be in CLOSED state. I hope this is clear. This is very essential to understand the following.

Let us try some simple cases now. The first door wouldn't be toggled further, after 1st iteration which means one of the doors in open state after 100 iterations, is first door. Second door would be toggled in 2 iterations, namely first and second. So, it would be in closed state. Third would be toggled in iterations 1, 3 and would be in closed state. Fourth would be toggled in iterations 1, 2, 4 and would be in opened state. Now you could see the pattern clearly.

A 'N'th numbered door would have been toggled in an iteration 'i' if 'i' is a divisor of 'N'. Number of times a door toggled is the total number of divisors of door's number. For door 1, only one divisor is there which is again 1. For door 4, divisors are 1, 2, 4.

So, this question boils down to the simplified version: what are all the numbers( < 100) contains odd number of divisors.

Number theoretically, for what numbers tau(n) is odd. I will discuss Number theoretic functions in a different post.

Generally, if a number 'N' is divisible by 'n' and it would be divisible by (N / n) also. That means number of divisors would come in pair. But if a number is a square number like 25 which has divisor 5 and pair ( 25 / 5) is also 5, then it has odd number of divisors.

All square numbered doors like 1, 4, 9 ... 81, 100 would be in OPEN state and others in CLOSED state.


So, the number of opened doors after 100 iterations is 10. If you simulate the puzzle in computer, the program would be of complexity O(n * H[n]) where H[n] is 'n'th harmonic number.

http://en.wikipedia.org/wiki/Harmonic_number

In this naive approach, 1st iteration has to toggle 100 doors, 2nd 50 doors, 3rd 33 doors .... 100th 1 door.

So, total number of toggles would be ( 100 + 50 + 33 + 25 ... 1) => 100( 1 + 1/2 + 1/3 ... + 1/100), which is 100 * H(100) . I am attaching the code in Java. Remove the comments if you want to check the complexity in terms of iterations.


public static void main(String[] args)
{
List<Boolean> doorState = new ArrayList<Boolean>();
for(int i = 0; i <= 100; i++)
doorState.add(false);
// int totalIterations = 0;
for( int i = 1; i <= 100; i++)
{
// int innerIterations = 0;
for( int j = i; j <= 100; j += i)
{
Boolean state = doorState.get(j);
doorState.set(j, !state);
// totalIterations++;
// innerIterations++;
}
// System.out.println(innerIterations);
}
// System.out.println(totalIterations);
System.out.println("Following doors would be open: ");
for(int i = 1; i <= 100; i++)
{
if(doorState.get(i))
{
System.out.println(i);
}
}
}


Thanks for Reading.

Tuesday, September 27, 2011

C++ RAII - Resource acquisition is initialization.

RAII - Resource Acquisition is initialization is one of the powerful idioms in C++, which makes code exception-safe and maintainable. The idiom makes use of the feature of C++ that destructor for stack allocated objects would be called at the end of scope. Generally, resources can be allocated and deallocated at any point in implementation, which could be acceptable in all-goes-well scenarios. If exception had been thrown during execution of code, after resource allocation, but before release, then huge possibility is there to have the resource not released properly.

At the end of scope, in C++, all the stack allocated objects will be destructed for sure, even exception is thrown. RAII Idiom makes use of this feature by wrapping the resource in a class. In constructor, resource would be allocated and in destructor, release would happen. This is exception-safe. This calls for cleaning up all the resources at single place, destructor, instead of sprinkling this logic everywhere.

Let me give a simple example to illustrate this concept using File Stream example. Consider a function which takes two arguments, a string and file name; Let us assume the functionality of this is to write the string into file. One may code as given below.

fstream stream;
stream.open(fileName, ios::binary | ios::out | ios::app);

if(stream.is_open())
{
stream << str; // Assume there is an exception here ...
}

stream.close();

If there is some exception thrown at some point while writing, stream.close() won't get called at all. One solution is to solve this is adding a try/ catch around the writing part, which may lead to unessential maintenance issues. Program wouldn't be easily readable also. Let us rewrite the same using RAII as below.

class SmartFile
{
public:
SmartFile(string fileName)
{
cout << "Stream Opening" << endl;
stream.open(fileName, ios::binary | ios::out | ios::app);
}
~SmartFile()
{
cout << "Stream Closing" << endl; stream.close();
}
void WriteString(string str)
{
if(stream.is_open())
{
cout << "File Writing" << endl; stream << str;
}
else
{
cout << "Write Error" << endl;
}
}
private:
fstream stream;
};

void WriteStringToFile(string str, string fileName)
{
SmartFile file(fileName);
file.WriteString(str);
}

The above function using RAII class SmartFile which is reliable, exception-safe. You can try the below given code to understand the same.

void WriteStringToFile(string str, string fileName)
{
SmartFile file(fileName);
throw 10;
file.WriteString(str);
}
Even though the above code throws an error code abnormally, SmartFile's destructor will be called.

RAII is being used in Smart Pointers, Synchronization based lock classes, etc. In a nutshell, this idiom necessitates to have all resources allocations in constructor and all deallocation logic in destructor to make the code exception-safe and maintainable.

Thanks for Reading.

Saturday, September 24, 2011

Nagle's Algorithm - Intuitive understanding.

In TCP / IP stack, at every layer some header information would be added up to make sure the information is identifiable at the corresponding layer in the destination node. For instance, at TCP Layer, TCP header would be added, which contains essential information to make sure TCP segments reach properly at the destination and congestion control is provided, etc. Without TCP header, these functionalities would be handled at the upper layer, say application layer which mainly deals with real data being sent.

Layered architecture imposes disparate responsibilities for layers. App layer deals with semantics of data. Transport deals with transport of data in a reliable way. IP to pass packets to the next hop, etc. So, real data will be treated in different ways in different layers. In Transport layer, TCP, data unit is Segment, In IP the same is Packet. This leads to varying level of treatments of the same information.

Applications can't determine how data would be transmitted to the destination. This in several cases wouldn't be a problem. For instance, file transfer would send large amount of data to TCP layer and TCP would transfer the same to the destination in FIFO, first in first out, order. TCP would internally divide the real file into several chunks whose size matches, MSS, Maximum segment size. This behavior is neat since chunks will be sent in FIFO and chunk size depends on several optimal network parameters like destination node's capability, network channel's bandwidth, etc. As long as file is being transmitted in a reliable way, this approach or algorithm is optimal. Only overhead in segment based transfer is TCP header along with IP header which would be 40 bytes in general. If MSS is 1000 bytes, this overhead is acceptable, since always segment size will be of MSS. Let us consider a Telnet session in which command data will be less than Header size. In this scenario, overhead is unacceptable as it is far bigger than real data being passed. If the real data being sent is 1 byte, then it overhead is of 4000%. In low bandwidth environment, this is a bottleneck.

The Nagle's algorithm solves this issue by buffering, called Nagling, the data to be sent. Algorithm is very intuitive. If the buffered data is enough to fit in a segment of size MSS, then it would be sent across; If all the sent data are already acknowledged, no need to buffer the current data at all.

Let us see how this algorithm works.

The issue discussed is not at all a problem in High bandwidth environments and data should be passed however low is the size of data. In several real time systems, this is an essential behaviour. In high bandwidth environments, most probably all data sent across would be acknowledged immediately. So, Nagling wouldn't happen.

In Low bandwidth environments, the data would be accumulated by TCP until the segment size of MSS, and send across. This would help to avoid overhead what we had discussed. The real issue with Nagling comes up where real time low data traffic should be sent in Low bandwidth environments. In this scenario, data transfer wouldn't happen immediately but buffered, which would lead to irritating user experience.

One more issue with Nagling is "Delayed ACK"s from the sender side which would delay the acknowledgements of Segments for certain time to optimize the network utilization.

Delayed ACKs' idea invalidates the Nagling optimization heuristic, since the latter depends on getting ACK to decide the buffering. Delayed ACK would make the Nagling happen always independent of network bandwidth but Nagling should happen only in Low bandwidth network. Nagle Algo's heuristic is not applicable in case Delayed ACK is enabled in Destination.

Almost all TCP implementations would allow to disable Nagling using TCP_NODELAY option, even though it is not suggested. In application level, the issues of Nagling could be resolved in several ways as explained in WikiPedia page,

Thanks for Reading.