Defining Useful Python Iterators for Data Science

By John Paul Mueller, Luca Massaron

You can use all kinds of techniques with Python to access individual values in various types of data structures for data science. Here, you use two simple lists, defined as the following:

ListA = [‘Orange’, ‘Yellow’, ‘Green’, ‘Brown’]
ListB = [1, 2, 3, 4]

The simplest method of accessing a particular value is to use an index. For example, if you type ListA[1] and press Enter, you see Yellow as the output. All indexes in Python are zero-based, which means that the first entry is 0, not 1.

Ranges present another simple method of accessing values. For example, if you type ListB[1:3] and press Enter, the output is [2, 3]. You could use the range as input to a for loop, such as

for Value in ListB[1:3]:
 print Value

Instead of the entire list, you see just 2 and 3 as outputs, printed on separate lines. The range has two values separated by a colon. However, the values are optional. For example, ListB[:3] would output [1, 2, 3]. When you leave out a value, the range starts at the beginning or the end of the list, as appropriate.

Sometimes you need to process two lists in parallel. The simplest method of doing this is to use the zip() function. Here’s an example of the zip() function in action:

for Value1, Value2 in zip(ListA, ListB):
 print Value1, ‘t’, Value2

This code processes both ListA and ListB at the same time. The processing ends when the for loop reaches the shortest of the two lists. In this case, you see the following:

Orange 1
Yellow 2
Green 3
Brown 4

This is the tip of the iceberg. You see a host of iterator types used in data science. The idea is to make it possible to list just the items you want, rather than all of the items in a list or other data structure.