How to Apply Any Colony Clusters in Predictive Analysis - dummies

# How to Apply Any Colony Clusters in Predictive Analysis

A natural example of self-organizing group you can apply in predictive analysis behavior is a colony of ants hunting for food. The ants collectively optimize their track so that it always takes the shortest route possible to a food target.

Even if you try to disturb a marching colony of ants and prevent them from getting to the food target they get back on track quickly and (again) find the shortest way possible to the food target, all of them avoiding the same obstacles while looking for food. This uniformity of behavior is possible because every ant deposits a trail of pheromones on the ground.

Consider an army of ants idle in their nest. When they start looking for food, they have absolutely no information about where to find it. They march randomly until an individual ant finds food; now the lucky ant (call it Ant X) has to communicate its find to the rest of the ants — and to do that, it must find its way back to the nest.

Fortunately, Ant X was producing its own pheromones the whole time it was looking for food; it can follow its own trail of pheromones back to the nest. On its way back to the nest, following its own pheromone trail, Ant X puts more pheromones on the same trail.

As a result, the scent on Ant X’s trail will be the strongest among all the other ants’ trails. The strongest trail of pheromones will attract all the other ants that are still searching for food. They’ll follow the strongest scent. As more ants join Ant X’s trail, they add more pheromones to it; the scent becomes stronger. Pretty soon, all the other ants have a strong scent to follow.

If several ants have discovered the same source of food, the ants that took the shortest path will do more trips in comparison to ants that follow longer paths — hence more pheromones will be produced on the shortest path. The relationship between individual and collective behavior is an enlightening natural example.

Every dot represents a document. Assume that the black dots are documents about predictive analytics and the white dots are documents about anthropology. Dots representing the different types of documents are randomly distributed in the grid of five cells.

“Ants” are deployed randomly in the grid to search for similar documents. Every cell with a value in it represents an instance of a “pheromone.” Using the document matrix, each cell’s “pheromone” value is calculated from the corresponding document.

Okay, how does an ant colony’s collective intelligence produce a model for effectively clustering data? The answer lies in a simple analogy: Ants are searching for food in their environment, much as we’re searching for clusters in a dataset — looking for similar documents within a large set of documents.

Consider a dataset of documents that you want to organize by topic. Similar documents will be grouped in the same cluster. Here’s where the ant colony can provide hints on how to group similar documents.

Imagine a two-dimensional (2D) grid where you can represent documents as dots. The 2D grid is divided into cells. Each cell has a “pheromone” (value) associated with it. Briefly, the “pheromone” value distinguishes each document in a given cell.

The dots are initially distributed randomly — and every dot in the grid represents a unique document. The next step is to deploy other dots randomly on the 2D grid, simulating the ant colony’s search for food in its environment. Those dots are initially scattered in the same 2D grid with the documents.

Each new dot added to the grid represents an ant. Those “ants,” often referred to in the ant-colony algorithm as agents, are moving in the 2D grid. Each “ant” will either pick up or drop off the other dots (documents), depending on where the documents best belong. In this analogy, the “food” takes the form of documents sufficiently similar that they can be clustered.

An “ant” walks randomly in the grid; if it encounters a document, it can perform one of two actions: pick or drop. Each cell has a “pheromone intensity” that indicates how similar the document is to the other documents (dots) residing near the document in question — the one an “ant” is about to either pick up or drop.

Note that the “ant” in Cell 3 will pick up the black-dotted document because the white “pheromone” value is dominating; and move to a cell where the value is close (similar) to what’s in Cell 4 (several black dots). The search keeps iterating until the clusters form.

In effect, the “ant” moves documents from one cell to another to form clusters by performing either one of only two actions: picking up a document or dropping a document.

When the “ants” started moving randomly on the grid, encountering a dot (document) results in the “ant” picking up a document from its current cell, moving with it, and dropping it into a cell in which it had sufficient similarity to fit.

How would an “ant” determine the best cell in which to drop a document? The answer is that the values in the cells act like “pheromones” — and every cell in the 2D grid contains a numerical value that can be calculated in a way that represents a document in the cell.

Remember that each document is represented as a set of numbers or a vector of numerical values. The “intensity of the pheromone” (the numerical value) increases when more documents are dropped into the cell — and that value decreases if the numbers that represent documents are moved out of the cell.