How to Use Python to Develop Graphs for Data Science

By John Paul Mueller, Luca Massaron

Graphs are useful for data scientists. A graph is a depiction of data showing the connections between data points using lines in Pythopn. The purpose is to show that some data points relate to other data points, but not all the data points that appear on the graph.

Think about a map of a subway system. Each of the stations connects to other stations, but no single station connects to all the stations in the subway system. Graphs are a popular data science topic because of their use in social media analysis. When performing social media analysis, you depict and analyze networks of relationships, such as friends or business connections, from social hubs such as Facebook, Google+, Twitter, or LinkedIn.

The two common depictions of graphs are undirected, where the graph simply shows lines between data elements, and directed, where arrows added to the line show that data flows in a particular direction. For example, consider a depiction of a water system. The water would flow in just one direction in most cases, so you could use a directed graph to depict not only the connections between sources and targets for the water but also to show water direction by using arrows.

Developing undirected graphs

An undirected graph simply shows connections between nodes. The output doesn’t provide a direction from one node to the next. For example, when establishing connectivity between web pages, no direction is implied. The following example shows how to create an undirected graph.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
H = nx.Graph()
G.add_node(1)
G.add_nodes_from([2, 3])
G.add_nodes_from(range(4, 7))
H.add_node(7)
G.add_nodes_from(H)
G.add_edge(1, 2)
G.add_edge(1, 1)
G.add_edges_from([(2,3), (3,6), (4,6), (5,6)])
H.add_edges_from([(4,7), (5,7), (6,7)])
G.add_edges_from(H.edges())
nx.draw_networkx(G)
plt.show()

This example builds the graph using a number of different techniques. It begins by importing the Networkx package. To create a new undirected graph, the code calls the Graph() constructor, which can take a number of input arguments to use as attributes. However, you can build a perfectly usable graph without using attributes, which is what this example does.

The easiest way to add a node is to call add_node() with a node number. You can also add a list, dictionary, or range() of nodes using add_nodes_from(). In fact, you can import nodes from other graphs if you want.

Even though the nodes used in the example rely on numbers, you don’t have to use numbers for your nodes. A node can use a single letter, a string, or even a date. Nodes do have some restrictions. For example, you can’t create a node using a Boolean value.

Nodes don’t have any connectivity at the outset. You must define connections (edges) between them. To add a single edge, you call add_edge() with the numbers of the nodes that you want to add. As with nodes, you can use add_edges_from() to create more than one edge using a list, dictionary, or another graph as input. Here’s the output from this example (your output may differ slightly but should have the same connections).

Undirected graphs connect nodes together to form patterns.

Undirected graphs connect nodes together to form patterns.

Developing directed graphs

You use directed graphs when you need to show a direction, say from a start point to an end point. When you get a map that shows you how to get from one specific point to another, the starting node and ending node are marked as such and the lines between these nodes (and all the intermediate nodes), show direction.

Your graphs need not be boring. You can dress them up in all sorts of ways so that the viewer gains additional information in different ways. For example, you can create custom labels, use specific colors for certain nodes, or rely on color to help people see the meaning behind your graphs.

You can also change edge line weight and use other techniques to mark a specific path between nodes as the better one to choose. The following example shows many (but not nearly all) the ways in which you can dress up a directed graph and make it more interesting:

import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_node(1)
G.add_nodes_from([2, 3])
G.add_nodes_from(range(4, 6))
G.add_path([6, 7, 8])
G.add_edge(1, 2)
G.add_edges_from([(1,4), (4,5), (2,3), (3,6), (5,6)])
colors = [‘r’, ‘g’, ‘g’, ‘g’, ‘g’, ‘m’, ‘m’, ‘r’]
labels = {1:’Start’, 2:’2’, 3:’3’, 4:’4’,
   5:’5’, 6:’6’, 7:’7’, 8:’End’}
sizes = [800, 300, 300, 300, 300, 600, 300, 800]
nx.draw_networkx(G, node_color=colors, node_shape=‘D’,
     with_labels=True, labels=labels,
     node_size=sizes)
plt.show()

The example begins by creating a directional graph using the DiGraph() constructor. You should note that the NetworkX package also supports MultiGraph() and MultiDiGraph() graph types. Check out this listing of all the graph types.

Adding nodes is much like working with an undirected graph. You can add single nodes using add_node() and multiple nodes using add_nodes_from(). The add_path() call lets you create nodes and edges at the same time. The order of nodes in the call is important. The flow from one node to another is from left to right in the list supplied to the call.

Adding edges is much the same as working with an undirected graph, too. You can use add_edge() to add a single edge or add_edges_from() to add multiple edges at one time. However, the order of the node numbers is important. The flow goes from the left node to the right node in each pair.

This example adds special node colors, labels, shape (only one shape is used), and sizes to the output. You still call on draw_networkx() to perform the task. However, adding the parameters shown changes the appearance of the graph. Note that you must set with_labels to True in order to see the labels provided by the labels parameter. Here is the output from this example.

Use directed graphs to show direction between nodes.

Use directed graphs to show direction between nodes.