Data Analytics & Visualization All-in-One Cheat Sheet

A wide range of tools is available that are designed to help big businesses and small take advantage of the data science revolution. Among the most essential of these tools are Microsoft Power BI, Tableau, SQL, and the R and Python programming languages.

Comparing Microsoft Power BI and Excel

Microsoft markets Power BI as a way to connect and visualize data using a unified, scalable platform that offers self-service and enterprise business intelligence that can help you gain deep insights into data. You may already have Excel, which will perform many of the functions you need, and wonder if upgrading to Power BI is worth the effort.

Here are some advantages of upgrading to Power BI:

Power BI supplies an array of high-level analytics offerings that Excel doesn’t include, such as the ability to create dashboards, key performance indicators (KPI), visualizations, and alerts.
Power BI has significant collaboration capabilities, whereas Excel has limited data collaboration options.
Though Excel can help when it comes to creating advanced reports, if you want to build data models that include predictive and machine learning assets, you have to turn to specific versions of Power BI.
There is no single free version of Excel. On the other hand, you can start with Power BI for free. You can also purchase premium alternatives if you need advanced features — from a few dollars per month to several thousand.
Power BI integrates business intelligence (BI) and data visualization so that users can create custom and interactive dashboards, KPIs, and reports. Microsoft Excel is limited in handling data analytics, mathematical operations, or data organization using a spreadsheet.
Power BI can extract and format data from more than a single data source type. Because Power BI handles extensive data ingestion — the uploading of data from an external source, in other words —the process is, by nature, much faster.
Because Power BI can connect with various data sources, the range of outputs, including dashboards and reports, is more interactive, whereas Excel is limited in scope. Above all, Power BI is a tool for data visualization and analysis that allows for collaboration. Excel limits sharing and data analysis to a limited number of end users.

Engaging Tableau users based on user type

Tableau is a tool used for Enterprise BI but heavily leveraged in communities where data is regulated such as banking, healthcare, insurance, and government. Users access, author, prepare, interact, collaborate, and govern their data across Tableau Desktop, Tableau Prep, and Tableau Cloud based on their user type. Following is a handy “quick reference” for those times when you need to know a Tableau user’s limitations based on their user type.

Access

Tableau recognizes the following two discriminating access types:

Key access capabilities

	Creator	Explorer	Viewer
Web and mobile	✓	✓	✓
Embedded content	✓	✓	✓

Author

Authors in Tableau leverage the Tableau platform to make decisions by digging into the available data sources to create visualizations for themselves or manage those for others in a power-user capacity, as noted in the following table.

Key author capabilities offered in tableau

	Creator	Explorer	Viewer
Edit existing workbooks and visualizations	✓	✓
Create and publish new workbooks from existing published data sources only	✓	✓
Explore existing published data sources with Ask Data, a natural language engine for analytics analysis	✓	✓
Create and publish new workbooks with one or more new data sources	✓
Create and publish new data sources	✓
Create new workbooks based on Dashboard Starters, a way to integrate with other enterprise software applications such as Salesforce CRM or SAP ERP (Tableau Cloud only)	✓

Prepare

Data preparation is one area that stands out for those requiring the development functionality found in Tableau Desktop. Unless you are merely the orchestrator of data, which includes scheduling the data for dissemination, all data preparation actions fall under the Creator user type. The following table summarizes key capabilities for data preparation offered in Tableau.

Key preparation capabilities offered In Tableau

	Creator	Explorer	Viewer
Create new data flow files (.tfl) or .hypher file	✓
Edit and modify data flow files	✓
Export data files (.tde, .hyper, .csv)	✓
Publish and run flows	✓
Schedule flows	✓	✓

Interact

Interaction is a big part of the sales pitch with the Tableau brand, so it’s not surprising that all license types include a bevy of interaction options. The noticeable difference is the ability to download summaries as opposed to full data, as shown in the following table. If you need to dig into the weeds on a data source, the Creator license is non-negotiable.

Key interaction capabilities offered in Tableau

	Creator	Explorer	Viewer
Interact with data using a variety of visualization types	✓	✓	✓
Create and share views	✓	✓	✓
Download visualizations as static images (.pdf, .png, .jpg)	✓	✓	✓
Download summary data	✓	✓	✓
Download full data	✓	✓

Collaborate

Except for allowing one or more parties to share, a Viewer has all the same collaboration features as a Creator and Explorer. As shown in the following table, the various collaboration features enable subscriptions and alerts for others as part of the programmatic process, which a developer or power user often completes.

Key collaboration capabilities offered in Tableau

	Creator	Explorer	Viewer
Comment on any visualization, including dashboards, reports, KPIs, and stories	✓	✓	✓
Create subscriptions for yourself	✓	✓	✓
Receive alert notifications	✓	✓	✓
Create subscriptions for others	✓	✓
Create alert notifications for others	✓	✓

Govern

Govern is the fancy term for system administration. Viewers have no administrative capabilities, whereas an Explorer, the “power user,” can limit user access. But when it comes to managing enterprise security for data sources and integrating with security tenants, a way to isolate privileged and secure organizational data using an identity management platform such as Microsoft Azure Directory, you must be a Creator, as noted in the following table.

Key governance capabilities offered in Tableau

	Creator	Explorer	Viewer
Manage users and permissions	✓	✓
Manage content and certify data sources	✓	✓
Perform server administration	✓
Conduct fine-grained security management	✓

SQL data types

SQL is a querying language that is used with proprietary and open-source data analytics and visualization platforms. The following table summarizes commonly used SQL data types and gives an example of each.

SQL data types

*Data Type*	*Example Value*
`CHARACTER (20)`	`'Amateur Radio'`
`VARCHAR (20)`	`'Amateur Radio'`
`CLOB (1000000)`	`'This character string is a million characters long … '`
`SMALLINT, BIGINT, or INTEGER`	`7500`
`NUMERIC or DECIMAL`	`3425.432`
`REAL, FLOAT, or DOUBLE PRECISION`	`6.626E-34`
`BINARY`	`'1011001110101010'`
`BINARY VARYING`	`'10110'`
`BLOB (1000000)`	`'1001001110101011010101010101… '`
`BOOLEAN`	`'true'`
`DATE`	`1957-08-14`
`TIME WITHOUT TIME ZONE (2)¹`	`12:46:02.43`
`TIME WITH TIME ZONE (3)`	`12:46:02.432-08:00`
`TIMESTAMP WITHOUT TIME ZONE (0)`	`1957-08-14 12:46:02`
`TIMESTAMP WITH TIME ZONE (0)`	`1957-08-14 12:46:02-08:00`
`INTERVAL DAY`	`INTERVAL '4' DAY`
`ROW`	`ROW (Street VARCHAR (25), City VARCHAR (20), State CHAR (2), PostalCode VARCHAR (9))`
`ARRAY`	`INTEGER ARRAY [15]`
`MULTISET`	`Phone VARCHAR (15) MULTISET [4]`
`REF`	`Not an ordinary type, but a pointer to a referenced type`
`USER DEFINED TYPE`	`Currency type based on DECIMAL`

¹Argument specifies number of fractional digits.

R statistical functions

R is an open-source programming language that can be configured for use with Power BI and Tableau, but is more commonly used with open-source (free) platforms like Jupyter Notebook and Anaconda to conceive data analytics outputs and visualizations. Unlike Power BI and Tableau, open-source tools leveraging programming languages are used in academic settings or by analysts requiring technologies that are data intensive.

Here’s a selection of statistical functions that come with the standard R installation. You’ll find many others in R packages.

Central tendency and variability

Function	What it calculates
mean(x)	Mean of the numbers in vector x
median(x)	Median of the numbers in vector x
var(x)	Estimated variance of the population from which the numbers in vector x are sampled
sd(x)	Estimated standard deviation of the population from which the numbers in vector x are sampled
scale(x)	Standard scores (z-scores) for the numbers in vector x

Relative standing


Function	What it calculates
sort(x)	The numbers in vector x in increasing order
sort(x)[n]	The nth smallest number in vector x
rank(x)	Ranks of the numbers (in increasing order) in vector x
rank(-x)	Ranks of the numbers (in decreasing order) in vector x
rank(x, ties.method= “average”)	Ranks of the numbers (in increasing order) in vector x, with tied numbers given the average of the ranks that the ties would have attained
rank(x, ties.method= “min”)	Ranks of the numbers (in increasing order) in vector x, with tied numbers given the minimum of the ranks that the ties would have attained
rank(x, ties.method = “max”)	Ranks of the numbers (in increasing order) in vector x, with tied numbers given the maximum of the ranks that the ties would have attained
quantile(x)	The 0th, 25th, 50th, 75th, and 100th percentiles (the quartiles, in other words) of the numbers in vector x. (That’s not a misprint: quantile(x) returns the quartiles of x.)

t-tests


Function	What it calculates
t.test(x,mu=n, alternative = “two.sided”)	Two-tailed t-test that the mean of the numbers in vector x is different from n.
t.test(x,mu=n, alternative = “greater”)	One-tailed t-test that the mean of the numbers in vector x is greater than n.
t.test(x,mu=n, alternative = “less”)	One-tailed t-test that the mean of the numbers in vector x is less than n.
t.test(x,y,mu=0, var.equal = TRUE, alternative = “two.sided”)	Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The variances in the two vectors are assumed to be equal.
t.test(x,y,mu=0, alternative = “two.sided”, paired = TRUE)	Two-tailed t-test that the mean of the numbers in vector x is different from the mean of the numbers in vector y. The vectors represent matched samples.

Analysis of variance (ANOVA)

Function	What it calculates
aov(y~x, data = d)	Single-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vector x as the levels of the independent variable. The data are in data frame d.
aov(y~x + Error(w/x), data = d)	Repeated Measures ANOVA, with the numbers in vector y as the dependent variable and the elements in vector x as the levels of an independent variable. Error(w/x) indicates that each element in vector w experiences all the levels of x. (In other words, x is a repeated measure.) The data are in data frame d.
aov(y~x*z, data = d)	Two-factor ANOVA, with the numbers in vector y as the dependent variable and the elements of vectors x and z as the levels of the two independent variables. The data are in data frame d.
aov(y~x*z + Error(w/z), data = d)	Mixed ANOVA, with the numbers in vector z as the dependent variable and the elements of vectors x and y as the levels of the two independent variables. Error(w/z) indicates that each element in vector w experiences all the levels of z. (In other words, z is a repeated measure.) The data are in data frame d.

Correlation and regression

Function	What it calculates
cor(x,y)	Correlation coefficient between the numbers in vector x and the numbers in vector y
cor.test(x,y)	Correlation coefficient between the numbers in vector x and the numbers in vector y, along with a t-test of the significance of the correlation coefficient.
lm(y~x, data = d)	Linear regression analysis with the numbers in vector y as the dependent variable and the numbers in vector x as the independent variable. Data are in data frame d.
coefficients(a)	Slope and intercept of linear regression model a.
confint(a)	Confidence intervals of the slope and intercept of linear regression model a.
lm(y~x+z, data = d)	Multiple regression analysis with the numbers in vector y as the dependent variable and the numbers in vectors x and z as the independent variables. Data are in data frame d.

When you carry out an ANOVA or a regression analysis, store the analysis in a list — for example: a <- lm(y~x, data = d). Then, to see the tabled results, use the summary() function: summary(a)

Python line plot styles

Like R, Python is an open-source programming language that can be configured for use with Power BI and Tableau, but is more commonly used with open-source (free) platforms such as like Jupyter Notebook and Anaconda.

When you use Python to create a plot, you need to identify the sources of information using more than just the lines. Creating a plot that uses differing line types and data point symbols makes the plot much easier for other people to use. Following is a table that lists the line plot styles.

Color		Marker		Style
Code	Line Color	Code	Marker Style	Code	Line Style
b	blue	.	point	–	Solid
g	green	o	circle	:	Dotted
r	red	x	x-mark	-.	dash dot
c	cyan	+	plus	—	Dashed
m	magenta	*	star	(none)	no line
y	yellow	s	square
k	black	d	diamond
w	white	v	down triangle
		^	up triangle
		<	left triangle
		>	right triangle
		p	5-point star
		h	6-point star

Remember that you can also use these styles with other kinds of plots. For example, a scatter plot can use these styles to define each of the data points. When in doubt, try the styles to see whether they’ll work with your particular plot.

Article Categories

Book Categories

Collections

Data Analytics & Visualization All-in-One Cheat Sheet

Comparing Microsoft Power BI and Excel

Engaging Tableau users based on user type

Access

Author

Prepare

Interact

Collaborate

Govern

SQL data types

SQL data types

R statistical functions

Central tendency and variability

Relative standing

t-tests

Analysis of variance (ANOVA)

Correlation and regression

Python line plot styles

About This Article

About the book author:

This article can be found in the category:

Article Categories

Book Categories

Collections

Data Analytics & Visualization All-in-One Cheat Sheet

Comparing Microsoft Power BI and Excel

Engaging Tableau users based on user type

Access

Author

Prepare

Interact

Collaborate

Govern

SQL data types

SQL data types

R statistical functions

Central tendency and variability

Relative standing

t-tests

Analysis of variance (ANOVA)

Correlation and regression

Python line plot styles

About This Article

This article is from the book:

About the book author:

This article can be found in the category: