Python for Data Science For Dummies Cheat Sheet

Python Essentials For Dummies

Python is an incredible programming language that you can use to perform data science tasks with a minimum of effort. The huge number of available libraries means that the low-level code you normally need to write is likely already available from some other source.

All you need to focus on is getting the job done. With that in mind, this Cheat Sheet helps you access the most commonly needed reminders for making your programming experience fast and easy.

The 8 most common Python programming errors

Every developer on the planet makes mistakes. However, knowing about common mistakes will save you time and effort later. The following list tells you about the most common errors that developers experience when working with Python:

Using the incorrect indentation: Many Python features rely on indentation. For example, when you create a new class, everything in that class is indented under the class declaration. The same is true for decision, loop, and other structural statements. If you find that your code is executing a task when it really shouldn’t be, start reviewing the indentation you’re using.
Relying on the assignment operator instead of the equality operator: When performing a comparison between two objects or value, you just use the equality operator (==), not the assignment operator (=). The assignment operator places an object or value within a variable and doesn’t compare anything.
Placing function calls in the wrong order when creating complex statements: Python always executes functions from left to right. So the statement MyString.strip().center(21, "*") produces a different result than MyString.center(21, "*").strip(). When you encounter a situation in which the output of a series of concatenated functions is different from what you expected, you need to check function order to ensure that each function is in the correct place.
Misplacing punctuation: You can put punctuation in the wrong place and create an entirely different result. Remember that you must include a colon at the end of each structural statement. In addition, the placement of parentheses is critical. For example, (1 + 2) * (3 + 4), 1 + ((2 * 3) + 4), and 1 + (2 * (3 + 4)) all produce different results.
Using the incorrect logical operator: Most of the operators don’t present developers with problems, but the logical operators do. Remember to use and to determine when both operands must be True and or when either of the operands can be True.
Creating count-by-one errors on loops: Remember that a loop doesn’t count the last number you specify in a range. So, if you specify the range [1:11], you actually get output for values between 1 and 10.
Using the wrong capitalization: Python is case sensitive, so MyVar is different from myvar and MYVAR. Always check capitalization when you find that you can’t access a value you expected to access.
Making a spelling mistake: Even seasoned developers suffer from spelling errors at times. Ensuring that you use a common approach to naming variables, classes, and functions does help. However, even a consistent naming scheme won’t always prevent you from typing MyVer when you meant to type MyVar.
Misunderstanding how function defaults work: A function’s default value is set at the time it’s first evaluated, rather than each time you call it. Consequently, a function declaration like this:def myFunc(list=[]):list.append(“value”)
return list

will only provide an empty list the first time you call it, rather than every time you call it without providing a value for list. Subsequent calls will simply add “value” to an ever growing list. So if you call myFunc() three times, list will actually equal [“value”, “value”, “value”]. The workaround for this issue is to check the input value every time in the code and act accordingly, such as:

def myFunc(list=None):

if list is None:

list = []

list.append(“value”)

return list
Modifying a list while iterating over it: If a developer is lucky, this particular mistake results in an index out-of-range error. At least there is some indication of where to look. However, when working with some data science problems that don’t use the entire list, but simply iterate over parts of it, the mistake can introduce all sorts of data skewing and analysis problems that can be extremely difficult to locate (assuming that you know there is a problem at all). Using list comprehensions is a common method of avoiding this problem.
Creating a module name that clashes with a Python standard library module: If you create a module that has the same name as an existing Python module, Python may import your module instead of the one you wanted, leading to some difficult-to-find errors. The best way to avoid this issue is to ensure that you use module names that are guaranteed to be unique, such as prepending your organization name to the module name.

Line plot styles

Whenever you create a plot, you need to identify the sources of information using more than just the lines. Creating a plot that uses differing line types and data point symbols makes the plot much easier for other people to use. The following table lists the line plot styles.

Color		Marker		Style
Code	Line Color	Code	Marker Style	Code	Line Style
b	blue	.	point	–	Solid
g	green	o	circle	:	Dotted
r	red	x	x-mark	-.	dash dot
c	cyan	+	plus	—	Dashed
m	magenta	*	star	(none)	no line
y	yellow	s	square
k	black	d	diamond
w	white	v	down triangle
		^	up triangle
		<	left triangle
		>	right triangle
		p	5-point star
		h	6-point star

Remember that you can also use these styles with other kinds of plots. For example, a scatter plot can use these styles to define each of the data points. When in doubt, try the styles to see whether they’ll work with your particular plot.

Common IPython Magic Functions

It’s kind of amazing to think that IPython provides you with magic, but that’s precisely what you get with the magic functions. Most magic functions begin with either a % or %% sign. Those with a % sign work within the environment, and those with a %% sign work at the cell level.

There are a few specialized functions, such as the system command escape (!), that require a special symbol or technique. Of these, the system command escape is the most essential to know. Another useful alternative is variable expansion ($), which is used like $(myVar), to provide a value without retyping it.

Note that the magic functions work best with Jupyter Notebook. People using alternatives, such as Google Colab, may find that some magic functions fail to provide the desired result.

The following list gives you a few of the most common magic functions and their purposes. To obtain a full list, type %quickref and press Enter in Jupyter Notebook or Google Colab, or check out the full list.

Magic Function	Type Alone Provides Status?	Description
%%timeit or %%prun	No	Calculates the best time performance for all the instructions in a cell, apart from the one placed on the same cell line as the cell magic (which could therefore be an initialization instruction). The %%prun variant provides more detailed information because it relies on the Python profiler output.
%%writefile	No	Writes the contents of a cell to the specified file.
%alias	Yes	Assigns or displays an alias for a system command.
%autocall	Yes	Enables you to call functions without including the parentheses. The settings are Off, Smart (default), and Full. The Smart setting applies the parentheses only if you include an argument with the call.
%automagic	Yes	Enables you to call the line magic functions without including the % sign. The settings are False (default) and True.
%bookmark	No	Sets, lists, or clears bookmarks used to track the current location within a drive’s directory system. This article provides additional information on using this magic.
%cd	Yes	Changes directory to a new storage location. You can also use this command to move through the directory history or to change directories to a bookmark.
%cls or %clear	No	Clears the screen.
%colors	No	Specifies the colors used to display text associated with prompts, the information system, and exception handlers. You can choose between NoColor (black and white), Linux (default), and LightBG.
%config	Yes	Enables you to configure IPython.
%debug or %%debug	Yes	Starts the Python interactive debugger so that it’s possible to debug an application within the Notebook environment.
%dhist	Yes	Displays a list of directories visited during the current session.
%env	Yes	Gets, sets, or lists environment variables.
%file	No	Outputs the name of the file that contains the source code for the object.
%hist	Yes	Displays a list of magic function commands issued during the current session.
%install_ext	No	Installs the specified extension.
%load	No	Loads application code from another source, such as an online example.
%load_ext	No	Loads a Python extension using its module name.
%lsmagic	Yes	Displays a list of the currently available magic functions.
%matplotlib	Yes	Sets the backend processor used for plots. Using the inline value displays the plot within the cell for an IPython Notebook file. The possible values are ‘gtk’, ‘gtk3’, ‘inline’, ‘nbagg’, ‘osx’, ‘qt’, ‘qt4’, ‘qt5’, ‘tk’, and ‘wx’.
%more	No	Displays a file through the pager so that it’s possible to scan a data file while working in it in code.
%paste	No	Pastes the content of the clipboard into the IPython environment.
%pdef	No	Shows how to call the object (assuming that the object is callable).
%pdoc	No	Displays the docstring for an object.
%pinfo	No	Displays detailed information about the object (often more than provided by help alone).
%pinfo2	No	Displays extra detailed information about the object (when available).
%psource	No	Displays the source code for the object (assuming that the source is available).
%reload_ext	No	Reloads a previously installed extension.
%timeit or %prun	No	Calculates the best performance time for an instruction. The %prun variant provides more detailed information because it relies on the Python profiler output.
%unalias	No	Removes a previously created alias from the list.
%unload_ext	No	Unloads the specified extension.

About This Article

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.