Python for Data Science For Dummies Cheat Sheet
Python is an incredible programming language that you can use to perform data science tasks with a minimum of effort. The huge number of available libraries means that the low-level code you normally need to write is likely already available from some other source. All you need to focus on is getting the job done. With that in mind, this cheat sheet helps you access the most commonly needed reminders for making your programming experience fast and easy.
The 8 Most Common Python Programming Errors
Every developer on the planet makes mistakes. However, knowing about common mistakes will save you time and effort later. The following list tells you about the most common errors that developers experience when working with Python:
Using the incorrect indentation: Many Python features rely on indentation. For example, when you create a new class, everything in that class is indented under the class declaration. The same is true for decision, loop, and other structural statements. If you find that your code is executing a task when it really shouldn’t be, start reviewing the indentation you’re using.
Relying on the assignment operator instead of the equality operator: When performing a comparison between two objects or value, you just use the equality operator (==), not the assignment operator (=). The assignment operator places an object or value within a variable and doesn’t compare anything.
Placing function calls in the wrong order when creating complex statements: Python always executes functions from left to right. So the statement
MyString.strip().center(21, "*")produces a different result than
MyString.center(21, "*").strip(). When you encounter a situation in which the output of a series of concatenated functions is different from what you expected, you need to check function order to ensure that each function is in the correct place.
Misplacing punctuation: You can put punctuation in the wrong place and create an entirely different result. Remember that you must include a colon at the end of each structural statement. In addition, the placement of parentheses is critical. For example,
(1 + 2) * (3 + 4), 1 + ((2 * 3) + 4), and
1 + (2 * (3 + 4))all produce different results.
Using the incorrect logical operator: Most of the operators don’t present developers with problems, but the logical operators do. Remember to use
andto determine when both operands must be
orwhen either of the operands can be
Creating count-by-one errors on loops: Remember that a loop doesn’t count the last number you specify in a range. So, if you specify the range
[1:11], you actually get output for values between 1 and 10.
Using the wrong capitalization: Python is case sensitive, so MyVar is different from myvar and MYVAR. Always check capitalization when you find that you can’t access a value you expected to access.
Making a spelling mistake: Even seasoned developers suffer from spelling errors at times. Ensuring that you use a common approach to naming variables, classes, and functions does help. However, even a consistent naming scheme won’t always prevent you from typing MyVer when you meant to type MyVar.
Line Plot Styles
Whenever you create a plot, you need to identify the sources of information using more than just the lines. Creating a plot that uses differing line types and data point symbols makes the plot much easier for other people to use. The following table lists the line plot styles.
|Code||Line Color||Code||Marker Style||Code||Line Style|
Remember that you can also use these styles with other kinds of plots. For example, a scatter plot can use these styles to define each of the data points. When in doubt, try the styles to see whether they’ll work with your particular plot.
Common IPython Magic Functions
It’s kind of amazing to think that IPython provides you with magic, but that’s precisely what you get with the magic functions. A magic function begins with either a % or %% sign. Those with a % sign work within the environment, and those with a %% sign work at the cell level.
Note that the magic functions work best with Jupyter Notebook. People using alternatives, such as Google Colab, may find that some magic functions fail to provide the desired result.
The following list gives you a few of the most common magic functions and their purpose. To obtain a full list, type %quickref and press Enter in the IPython console or check out the full list.
|Magic Function||Type Alone Provides Status?||Description|
|%%timeit||No||Calculates the best time performance for all the instructions in a cell, apart from the one placed on the same cell line as the cell magic (which could therefore be an initialization instruction).|
|%%writefile||No||Writes the contents of a cell to the specified file.|
|%alias||Yes||Assigns or displays an alias for a system command.|
|%autocall||Yes||Enables you to call functions without including the parentheses. The settings are Off, Smart (default), and Full. The Smart setting applies the parentheses only if you include an argument with the call.|
|%automagic||Yes||Enables you to call the line magic functions without including the % sign. The settings are False (default) and True.|
|%cd||Yes||Changes directory to a new storage location. You can also use this command to move through the directory history or to change directories to a bookmark.|
|%cls||No||Clears the screen.|
|%colors||No||Specifies the colors used to display text associated with prompts, the information system, and exception handlers. You can choose between NoColor (black and white), Linux (default), and LightBG.|
|%config||Yes||Enables you to configure IPython.|
|%dhist||Yes||Displays a list of directories visited during the current session.|
|%file||No||Outputs the name of the file that contains the source code for the object.|
|%hist||Yes||Displays a list of magic function commands issued during the current session.|
|%install_ext||No||Installs the specified extension.|
|%load||No||Loads application code from another source, such as an online example.|
|%load_ext||No||Loads a Python extension using its module name.|
|%lsmagic||Yes||Displays a list of the currently available magic functions.|
|%matplotlib||Yes||Sets the backend processor used for plots. Using the inline value displays the plot within the cell for an IPython Notebook file. The possible values are ‘gtk’, ‘gtk3’, ‘inline’, ‘nbagg’, ‘osx’, ‘qt’, ‘qt4’, ‘qt5’, ‘tk’, and ‘wx’.|
|%paste||No||Pastes the content of the clipboard into the IPython environment.|
|%pdef||No||Shows how to call the object (assuming that the object is callable).|
|%pdoc||No||Displays the docstring for an object.|
|%pinfo||No||Displays detailed information about the object (often more than provided by help alone).|
|%pinfo2||No||Displays extra detailed information about the object (when available).|
|%reload_ext||No||Reloads a previously installed extension.|
|%source||No||Displays the source code for the object (assuming that the source is available).|
|%timeit||No||Calculates the best performance time for an instruction.|
|%unalias||No||Removes a previously created alias from the list.|
|%unload_ext||No||Unloads the specified extension.|
Scikit-Learn Method Summary
Scikit-learn is a focal point for data science work with Python, so it pays to know which methods you need most. The following table provides a brief overview of the most important methods used for data analysis.
||Cross-validation phase||Estimate the cross-validation score|
||Cross-validation phase||Divide the dataset into k folds for cross validation|
||Cross-validation phase||Stratified validation that takes into account the distribution of the classes you predict|
||Cross-validation phase||Split your data into training and test sets|
||Dimensionality reduction||Principal component analysis (PCA)|
||Dimensionality reduction||Principal component analysis (PCA) using randomized SVD|
||Preparing your data||The hashing trick, allowing you to accommodate a large number of features in your dataset|
||Preparing your data||Convert text documents into a matrix of count data|
||Preparing your data||Directly convert your text using the hashing trick|
||Preparing your data||Creates a dataset of TF-IDF features|
||Feature selection||Automatic feature selection|
||Optimization||Exhaustive search in order to maximize a machine learning algorithm|
||Prediction||Linear logistic regression|
||Solution evaluation||Accuracy classification score|
||Solution evaluation||Compute the F1 score, balancing accuracy and recall|
||Solution evaluation||Mean absolute error regression error|
||Solution evaluation||Mean squared error regression error|
||Solution evaluation||Compute Area Under the Curve (AUC) from prediction scores|
||Prediction||Multinomial Naïve Bayes|
||Preparing your data||Create binary variables (feature values to 0 or 1)|
||Preparing your data||Missing values imputation|
||Preparing your data||Create variables bound by a minimum and maximum value|
||Preparing your data||Transform categorical integer features into binary ones|
||Preparing your data||Variable standardization by removing the mean and scaling to unit variance|