How to Visualize Predictive Analysis’ Raw Data

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

A picture is worth a thousand words — especially when you’re trying to get a good handle on your predictive analysis data. At the pre-processing step, while you’re preparing your data, it’s a common practice to visualize what you have in hand before continuing to the next step.

You start by using a spreadsheet such as Microsoft Excel to create a data matrix — which consists of candidate data features (also referred to as attributes). Several business intelligence software packages (such as Tableau) can give you a preliminary overview of the data to which you’re about to apply analytics.

How to use tabular visualizations for predictive analysis

Tables are the simplest, most basic pictorial representation of data. Tables (also known as spreadsheets) consist of rows and columns — which correspond, respectively, to the objects and their attributes mentioned earlier as making up your data. For instance, consider online social network data. A data object could represent a user. Attributes of a user (data object) can be headings of columns: Gender, Zip Code, or Date of Birth.

The cells in a table represent values. Visualization in tables can help you easily spot missing attribute values of data objects.

image0.jpg

Tables can also provide the flexibility of adding new attributes that are combinations of other attributes. For instance, in social network data, you can add another column called Age, which can be easily calculated — as a derived attribute — from the existing Date of Birth attribute. The tabular social network data shows a new column, Age, created from another existing column (Date of Birth).

image1.jpg

Bar charts use in predictive analysis

Bar charts can be used to spot spikes or anomalies in your data. You can use it for each attribute to quickly picture minimum and maximum values. Bar charts can be also used to start a discussion of how to normalize your data.

Normalization is the adjustment of some — or all — attribute values on a scale that makes the data more usable. For example, you can easily see that there’s an error in the data: The Age bar on one record is negative. That anomaly is more easily depicted by a bar chart than by a table of data.

image2.jpg

Basics of pie charts for predictive analysis

Pie charts are used mainly to show percentages. They can easily illustrate the distribution of several items, and highlight the most dominant. Raw data of social network is represented according to the Age attribute. Notice that the chart shows not only a clear distribution of males versus females, but also a probable error: R as a value for gender type possibly created when the data was collected.

image3.jpg

How to use graph charts for predictive analysis

Graph theory provides a set of powerful algorithms that can analyze data structured and represented as a graph. In computer science, a graph is data structure, a way to organize data that represents relations between pairs of data objects. A graph consists of two main parts:

  • Vertices, also known as nodes

  • Edges, which connect pairs of nodes

Edges can be directed (drawn as arrows) and can have weights. You can decide to place an edge (arrow) in between two nodes (circles) — in this case, the members of the social network who are connected to other members as friends:

image4.jpg

The arrow’s direction indicates who “friends” whom first, or who initiates interactions most of the time.

Basics of word clouds for predictive analysis

Consider a list of words or concepts arranged as a word cloud — a graphic representation of all words on the list, showing the size of each word as proportional to a metric that you specify. For instance, if you have a spreadsheet of words and occurrences and you’d like to identify the most important words, try a word cloud.

Word clouds work because most organizations’ data is text; a common example is Twitter’s use of trending terms. Every term in this representation has a weight that affects its size as an indicator of its relative importance.

One way to define that weight could be by the number of times a word appears in your data collection. The more frequently a word appears, the “heavier” its weight — and the larger it appears in the cloud.

image5.jpg

How to use flocking birds representation for predictive analysis

Natural flocking behavior in general is a self-organizing system in which objects (in particular, living things) tend to behave according to (a) the environment they belong to and (b) their responses to other existing objects. The flocking behavior of natural societies such as those of bees, flies, birds, fish, and ants — or, for that matter, people — is also known as swarm intelligence.

image6.jpg

Birds follow natural rules when they behave as a flock. Flock-mates are birds located with a certain distance from each other; those birds are considered similar. Each bird moves according to the three main rules that organize flocking behavior.

  • Separation: Flock-mates must not collide with each other.

  • Alignment: Flock-mates to move in the same average direction as their neighbors.

  • Cohesion: Flock-mates move according to the average position or location of their flock-mates.

Modeling those three rules can enable an analytical system to simulate flocking behaviors. Using the self-organized natural behavior of flocking birds, you can convert a straightforward spreadsheet into a visualization. The key is to define the notion of similarity as part of your data. Start with a couple of questions:

  • What makes two data objects in your data similar?

  • Which attributes can best drive the similarity between two data records?

    image7.jpg

For instance, in social network data, the data records represent individual users; the attributes that describe them can include Age, Zip Code, Relationship Status, List of Friends, Number of Friends, Habits, Events