Cheat Sheet
Algorithms For Dummies Cheat Sheet
Algorithms need not be boring or hard to use. In fact, algorithms surround you in many ways that you might not have thought about, and you use them every day to perform important tasks. However, you need to be able to use algorithms without having to become a mathematician.
Programming languages enable you to describe the steps used to create an algorithm. Some languages are better than others at performing this task in a way that people can understand without becoming computer scientists. Python makes using algorithms easier because it comes with a lot of built-in and extended support (through the use of packages, datasets, and other resources). This Cheat Sheet helps you access the most commonly needed tips for making your use of algorithms fast and easy.
Locating the Algorithm You Need
The following table describes algorithms and algorithm types that you might find useful for various types of data analysis. (You can find discussions of all these algorithms in Algorithms For Dummies.)
Algorithm | Description | Helpful Link |
A * Search | The algorithm tracks the cost of nodes as it explores them using the equation: f(n) = g(n) + h(n), where:
n is the node identifier g(n) is the cost of reaching the node so far h(n) is the estimated cost to reach the goal from the node f(n) is the estimated cost of the path from n to the goal The idea is to search the most promising paths first and avoid expensive paths. |
Standford.edu |
Balanced Tree | A kind of tree that maintains a balanced structure through reorganization so that it can provide reduced access times. The number of elements on the left side differs from the number on the right side by one at most. | Webdocs |
Bidirectional Search | This technique searches simultaneously from the root node and the goal node until the two search paths meet in the middle. An advantage of this approach is that it’s time efficient because it finds the solution faster than many other brute-force solutions. In addition, it uses memory more efficiently than other approaches and always finds a solution. The main disadvantage is complexity of implementation. | Planning.cs |
Binary Tree | This is a type of tree containing nodes that connect to zero (leaf nodes), one, or two (branch nodes) other nodes. Each node defines the three elements that it must include to provide connectivity and store data: data storage, left connection, and right connection. | cs.cmu.edu |
Breadth-First Search | This technique begins at the root node, explores each of the child nodes first, and only then moves down to the next level. It progresses level by level until it finds a solution. The disadvantage of this algorithm is that it must store every node in memory, which means that it uses a considerable amount of memory for a large number of nodes. This technique can check for duplicate nodes, which saves time, and it always comes up with a solution. | Khan Academcy |
Brute Force | This is a technique of problem solving in which someone tries every possible solution, looking for the best problem solution. Brute-force techniques do guarantee a best-fit solution when one exists but are so time consuming to implement that most people avoid them. | Igm.univ |
Depth-First Search | This technique begins at the root node and explores a set of connected child nodes until it reaches a leaf node. It progresses branch by branch until it finds a solution. The disadvantage of this algorithm is that it can’t check for duplicate nodes, which means that it could traverse the same node paths more than once. In fact, this algorithm may not find a solution at all, which means that you must define a cutoff point to keep the algorithm from searching infinitely. An advantage of this approach is that it’s memory efficient. | Hacker Earth |
Divide and Conquer | This is a technique of problem solving in which the problem is broken into the smallest possible pieces and solved using the simplest approach possible. This technique saves considerable time and resources when compared to other approaches, such as brute force. However, it doesn’t always guarantee a best-fit result. | Khan Academy |
Dijikstra | This is an algorithm used for finding the shortest path in a directed, weighted (having positive weights) graph. | Geeks for Geeks |
Graph | A graph is a sort of a tree extension. As with trees, you have nodes that connect to each other to create relationships. However, unlike binary trees, a graph can have more than one or two connections. In fact, graph nodes often have a multitude of connections. You see graphs used in places like maps for GPS and all sorts of other places for which the top-down approach of a tree won’t work. | Tutorials |
Greedy Algorithms | Thistechnique of one of problem solving in which the solution relies on the best answer for every step of the problem-solving process. Greedy algorithms generally make two assumptions:
Making a single optimal choice at a given step is possible. By choosing the optimal selection at each step, finding an optimal solution for the overall problem is possible. |
Tutorials |
Greedy Best-First Search (BFS) | The algorithm always chooses the path that is closest to the goal using the equation: f(n) = h(n). This particular algorithm can find solutions quite quickly, but it can also get stuck in loops, so many people don’t consider it an optimal approach to finding a solution. | Centurion2 |
Hashing | This is a method of predicting the location of a particular data item in the data structure (whatever that structure might be) before actually looking for it. This approach relies on the use of keys placed into an index. A hash function turns the key into a numeric value that the algorithm places into a hash table. A hash table provides the means to create an index that points to elements in a data structure so that an algorithm can easily predict the location of the data. | Tutorials |
Heap | This is a sophisticated tree that allows data insertions into the tree structure. The use of data insertion makes sorting faster. You can further classify these trees as max heaps and min heaps, depending on the tree’s capability to immediately provide the maximum or minimum value present in the tree. | Tutorials |
Heuristics | This is a technique of problem solving that relies on self-discovery and produces sufficiently useful results (not necessarily optimal, but good enough) to address a problem well enough that a better solution isn’t necessary. Self-discovery is the process of allowing the algorithm to show you a potentially useful path to a solution (but you must still count on human intuition and understanding to know whether the solution is the right one). | Northwest.edu |
MapReduce | This is a framework for making algorithms work using computations in parallel (using multiple computers connected together in a network), allowing algorithms to complete their solutions faster. | Hadoop Apache |
Mergesort | Mergesort is a general-purpose, comparison based method of sorting data. It depends on a divide-and-conquer approach to performing its task. | Geeks for Geeks |
Nash Equilibrium | This is a game theory in which the other players know the equilibrium strategy for the other players, so no one has anything to gain by changing his or her personal strategy. This theory sees use in any hostile situation in which the player must account for the decisions made by all of the other players in order to win the game. | Khan Academy |
PageRank | PageRank is an algorithm for measuring the importance of a node in a graph. This algorithm is at the root of the Google’s core algorithms for powering relevant searches to users. | Princeton.edu |
Pure Heuristic Search | This algorithm expands nodes in order of their cost. It maintains two lists. The closed list contains the nodes that it has already explored, and the open list contains the nodes it must yet explore. In each iteration, the algorithm expands the node with the lowest possible cost. All its child nodes are placed in the closed list and the individual child node costs are calculated. The algorithm sends the child nodes with a low cost back to the open list and deletes the child nodes with a high cost. Consequently, the algorithm performs an intelligent, cost-based search for the solution. | World of Computing |
Quicksort | This is a general-purpose sorting strategy based on partitioning arrays of data into smaller arrays. It depends on a divide-and-conquer approach to performing its task. | Tutorials |
Unbalanced Tree | This is a tree that places new data items wherever necessary in the tree without regard to balance. This method of adding items makes building the tree faster but reduces access speed when searching or sorting. | Quora |
Differentiating Algorithms from Other Math Structures
If you’re like most people, you often find yourself scratching your head when it comes to math structures because no one seems to know how to use the terms correctly. It’s as though people are purposely trying to make things hard! After all, what is an equation and why is it different from an algorithm? Well, fear no more: The following table provides the definitive guide to math structures that you might encounter but have been afraid to ask about.
Structure | Description |
Equation | Numbers and symbols that, when taken as a whole, equate to a specific value. An equation always contains an equals sign so that you know that the numbers and symbols represent the specific value on the other side of the equals sign. Equations generally contain variable information presented as a symbol, but they aren’t required to use variables. |
Formula | A combination of numbers and symbols used to express information or ideas. A formula normally presents mathematical or logical concepts, such as to define the Greatest Common Divisor (GCD) of two integers (the video at Khan Academy tells how this works). Generally, a formula shows the relationship between two or more variables. Most people see a formula as a special kind of equation. |
Algorithm | A sequence of steps used to solve a problem. The sequence presents a unique method of addressing an issue by providing a particular solution. An algorithm need not represent mathematical or logical concepts, even though the presentations in this book often do fall into that category because people most commonly use algorithms in this manner. Some special formulas are also algorithms, such as the quadratic formula. For a process to represent an algorithm, it must be the following:
Finite: The algorithm must eventually solve the problem. Well-defined: The series of steps must be precise and present steps that are understandable, especially by computers, which must be able to to create a usable algorithm. Effective: An algorithm must solve all cases of the problem for which someone defined it. An algorithm should always solve the problem it has to solve. Even though you should anticipate some failures, the incidence of failure is rare and occurs only in situations that are acceptable for the intended algorithm use. |
Amazing Ways to Use Algorithms
People actually use algorithms all the time. For example, making toast is an example of an algorithm, as explained in this blog post. Making toast isn’t an amazing algorithm, but the ones in the following table, which use a computer to perform tasks, are.
Task | Why It’s Amazing |
Cryptography | Keeping data safe is an ongoing battle with hackers constantly attacking data sources. Algorithms enable you to analyze data, put it into some other form, and then return it to its original form later. |
Graph analysis | The capability to decide on the shortest line between two points finds all sorts of uses. For example, in a routing problem, your GPS couldn’t function without this particular algorithm because it could never direct you along city streets using the shortest route from point A to point B. |
Pseudorandom number generation | Imagine playing games that never varied. You start at the same place and perform the same steps in the same manner every time you play. Boring! Without the capability to generate seemingly random numbers, many computer tasks become pointless or impossible. |
Scheduling | Making the use of resources fair to all concerned is another way in which algorithms make their presence known in a big way. For example, timing lights at intersections are no longer simple devices that count down the seconds between light changes. Modern devices consider all sorts of issues, such as the time of day, weather conditions, and flow of traffic. Scheduling comes in many forms, however. Consider how your computer runs multiple tasks at the same time. Without a scheduling algorithm, the operating system might grab all the available resources and keep your application from doing any useful work. |
Searching | Locating information or verifying that the information you see is the information you want is an essential task. Without this capability, many tasks you perform online wouldn’t be possible, such as finding the website on the Internet that sells the perfect coffee pot for your office. |
Sorting | Determining the order in which to present information is important because most people today suffer from information overload, and need to reduce the onrush of data. Imagine going to Amazon, finding more than a thousand coffee pots for sale, and yet not being able to sort them according to price or most positive review. Moreover, many complex algorithms require data in the proper order to work dependably, so sorting is an important requisite for solving more problems. |
Transforming | Converting one kind of data to another kind of data is critical to understanding and using the data effectively. For example, you might understand imperial weights just fine, but all your sources use the metric system. Converting between the two systems helps you understand the data. Likewise, the Fast Fourier Transform (FFT) converts signals between the time domain and the frequency domain, enabling things like your WiFi router to work. |
Dealing with Algorithm Complexity
You already know that algorithms are complex. However, you need to know how complex an algorithm is because the more complex one is, the longer it takes to run. The following table helps you understand the various levels of complexity presented in order of running time (from fastest to slowest).
Complexity | Description |
Constant complexity O(1) | Provides an unvarying execution time, no matter how much input you provide. Each input requires a single unit of execution time. |
Logarithmic complexity O(log n) | The number of operations grows at a slower rate than the input, making the algorithm less efficient with small inputs and more efficient with larger ones. A typical algorithm of this class is the binary search. |
Linear complexity O(n) | Operations grow with the input in a 1:1 ratio. A typical algorithm is iteration, when you scan input once and apply an operation to each element of it. |
Linearithmic complexity O(n log n) | Complexity is a mix between logarithmic and linear complexity. It is typical of some smart algorithms used to order data, such as Mergesortsort, Heapsort, and Quicksort. |
Quadratic complexity O(n^{2}) | Operations grow as a square of the number of inputs. When you have one iteration inside another iteration (called nested iterations in computer science), you have quadratic complexity. For instance, you have a list of names and, in order to find the most similar ones, you compare each name against all the other names. Some less efficient ordering algorithms present such complexity: bubble sort, selection sort, and insertion sort. This level of complexity means that your algorithms may run for hours or even days before reaching a solution. |
Cubic complexity O(n^{3}) | Operations grow even faster than quadratic complexity because now you have multiple nested iterations. When an algorithm has this order of complexity and you need to process a modest amount of data (100,000 elements), your algorithm may run for years. When you have a number of operations that is a power of the input, it is common to refer to the algorithm as running in polynomial time. |
Exponential complexity O(2^{n}) | The algorithm takes twice the number of previous operations for every new element added. When an algorithm has this complexity, even small problems may take forever. Many algorithms doing exhaustive searches have exponential complexity. However, the classic example for this level of complexity is the calculation of Fibonacci numbers. |
Factorial complexity O(n!) | This algorithm presents a real nightmare of complexity because of the large number of possible combinations between the elements. Just imagine: If your input is 100 objects, and an operation on your computer takes 10^{-6} seconds (a reasonable speed for every computer nowadays), you will need about 10^{140} years to complete the task successfully (an impossible amount of time because the age of the universe is estimated as being 10_{14} years). A famous factorial complexity problem is the traveling salesman problem, in which a salesman has to find the shortest route for visiting many cities and coming back to the starting city. |