How to Utilize Bird Flock Clusters in Predictive Analysis - dummies

How to Utilize Bird Flock Clusters in Predictive Analysis

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

Imagine birds’ flocking behavior as a model for your company’s predictive analysis data. Each data item corresponds to a single bird in the flock; an appropriate visual application can show the flock in action in an imaginary visual space.

Your dataset corresponds to the flock. The natural flocking behavior corresponds to data patterns that might otherwise go undiscovered. The aim is to detect swarms (data clusters) among the flocking birds (data elements).

Flocking behavior has been used in real-life applications such as robotics-based rescue operations and computer animation. For example, the producer of the movie Batman Returns generated mathematical flocking behavior to simulate bat swarms and penguin flocks.

The use of flocking behavior as a predictive analytics technique — analyzing a company’s data as flocks of similar data elements — is based on the dynamics behind flocking behavior as it appears in nature.

Flocking behavior of birds, fish, flies, bees, and ants is a self-organizing system; the individuals tend to move in accordance with both their environment and neighboring individuals.

In a flock of birds, each bird applies three main rules while flocking:

  • Separation keeps a bird apart from its nearest flock mates.

  • Alignment allows a bird to move along the same average heading as that of its flock mates.

  • Cohesion keeps the bird within the local flock.

Each bird in a flock moves according to these rules. A bird’s flock mates are birds found within a certain distance of the bird, and a certain distance from each other. To avoid collision between birds, a minimum distance must be kept; it can also be mathematically defined. Such are the rules that orchestrate flocking behavior; using them to analyze data is a natural next step.

Consider a dataset of online social network users. Data clustering can identify social communities that share the same interests. Identifying social communities in a social network is valuable tool that can transform how organizations think, act, operate, and manage their marketing strategies.

How do you obtain a dataset of social network users? Well, some of the data and tools are already available: Major social networks and micro-blog websites such as Facebook and Twitter provide an application programming interface (API) that allows you to develop programs that can obtain public data posted by users.

Those APIs offered by Twitter are referred to as Twitter Streaming APIs. They come in three main types: public, user, and site streams:

  • Public streams allow a user to collect public tweets about a specific topic or user, or support an analytics purpose.

  • User streams allow a user to collect tweets that are accessible by the user’s account.

  • Site streams are for large-scale servers that connect to Twitter on behalf of many users.

Now, suppose you use such a program to download users’ data and organize it into a tabular format such as the matrix shown. It shows a simple matrix that records the online interactions of Zach’s online friends over two different weeks. This dataset consists of seven elements and seven features. The features as shown in the table column are the number of interactions between each member and the other members.

There are many ways to apply the bird-flocking behavior to discover clusters in large datasets. One of the most recent variations is the Flock by Leader machine-learning clustering algorithm, inspired by the discovery of bird leaders in the pigeon species. The algorithm predicts data elements that could potentially lead another group of data objects.

A leader is assigned, and then the leader initiates and leads the flocking behavior. Over the course of the algorithm, leaders can become followers or outliers. In essence, this algorithm works in a way that follows the rules of “survival of the fittest.”

The Flock by Leader algorithm was first introduced by Abdelghani Bellaachia and Anasse Bari in “Flock by Leader: A Novel Machine Learning Biologically Inspired Clustering Algorithm,” published as a chapter in the proceedings of the 2012 Advances in Swarm Intelligence conference.

The following shows one possible way to represent data generated by online social exchanges over two weeks. It shows that Zach interacted 56 times with Kellie and five times with Arthur.

Social Member Network Interactions with John Interactions with Mike Interactions with Zach Interactions with Emma Interactions with Kellie Interactions with Nicole Interactions with Arthur
John 10 10 12 4 4 10
Mike 5 5 56 57 5
Zach 6 41 4 4
Emma 28 8 8
Kellie 5 5
Nicole 4
Social Member Network Interactions with John Interactions with Mike Interactions with Zach Interactions with Emma Interactions with Kellie Interactions with Nicole Interactions with Arthur
John 10 12 10 0 10 8
Mike 50 2 0 0 5
Zach 9 0 1 3
Emma 2 2 1
Kellie 4 9
Nicole 1

Here is an example which outlines how to apply the bird-flocking algorithm to analyze social network data. As depicted, each member is represented by a bird in virtual space. Notice that

  • The birds are initially dispersed randomly in the virtual space.

  • Each bird has a velocity and a position associated with it.

  • Velocity and position are calculated for each bird, using three vectors: separation, attraction, and alignment.

  • Each bird moves according to the three vectors, and this movement produces the flocking behavior seen in nature.


Here interaction data is analyzed weekly to find similar social networks’ users. Each week the birds can be visualized in a simple grid. The positions of these birds reflect the interactions of actual individuals in the real world.