Excel 2007 Data Analysis For Dummies
Excel 2007 has great data analysis tools. You can create any statistic you want; just make sure that you know what the statistic means. For that, you need to know the statistical measures available, the Boolean expressions Excel 2007 uses, and the Excel 2007 database functions.
Excel 2007 Database Functions
Excel 2007's data analysis tools include a set of database functions for making statistical calculations using information from lists. The following table shows these functions and explains what each does:
|DAVERAGE||Calculates arithmetic mean|
|DCOUNT||Counts the number of cells with values|
|DCOUNTA||Counts the number of cells that aren’t empty|
|DGET||Returns a value from a database list|
|DMAX||Finds the largest value in a list|
|DMIN||Finds the smallest value in a list|
|DPRODUCT||Calculates the product of values matching criteria|
|DSTDEV||Calculates the standard deviation of a sample|
|DSTDEVP||Calculates the standard deviation of a population|
|DSUM||Calculates the sum of values matching criteria|
|DVAR||Calculates the variance of a sample|
|DVARP||Calculates the variance of a population|
All these database functions use a standard three-argument syntax. For example, the DAVERAGE function looks like this: =DAVERAGE(database,field,criteria)where database is a range reference to the Excel list that holds the value you want to examine, field tells Excel which column in the database to examine, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
Excel 2007's Quick Data Analysis from the Status Bar
Excel 2007 can tell you quite a bit about a selected range of cells just from the status bar. When you select cells that have data in them, Excel's status bar can tell you some useful information about those cells. When you right-click the status bar, Excel displays a pop-up menu that includes several statistical measures that you can make on the selected range; they’re shown in the following table:
|Statistical Measures Option||What It Does||Statistical Measures Option||What It Does|
|[None]||Tells Excel that you don't want it to calculate and then show a statistic on the status bar.||Maximum||Finds the largest value in the selected range|
|Average||Finds the mean of values in selected range.||Minimum||Finds the smallest value in the selected range|
|Count||Tallies the cells that hold labels, values, or formulas. Use this when you want to count the number of cells that aren’t empty.||Sum||Adds the values in the selected range|
|Numerical Count||Tallies the number of cells in a selected range that hold values or formulas.|
Using Boolean Expressions in Data Analysis with Excel 2007
Excel 2007 data analysis tools include Boolean expressions, of course. To construct a Boolean expression, such as when you filter criteria, use a comparison operator and then a value in the comparison (>5, for example). The following table shows the Excel comparision operators and what they represent:
|Comparison Operator||What It Means|
|>=||Greater than or equal to|
|<=||Less than or equal to|
|<>||Not equal to|
Statistics Terms to Know when Using Excel 2007 Data Analysis Tools
With the data analysis tools available in Excel 2007, you can create spreadsheets that show the details of any statistic you can create a formula to find — and you can find any number. It helps to know what you’re looking for and what to expect, and the terms in the following list help you understand what kinds of statistics you can produce.
average: Typically, an average is the arithmetic mean for a set of values. Excel supplies several average functions.
chi-square: Use chi-squares to compare observed values with expected values, returning the level of significance, or probability (also called a p-value). A p-value helps you to assess whether differences between the observed and expected values represent chance.
cross-tabulation: This is an analysis technique that summarizes data in two or more ways. Summarizing sales information both by customer and product is a cross-tabulation.
descriptive statistics: Descriptive statistics just describe the values in a set. For example, if you sum a set of values, that sum is a descriptive statistic. Finding the biggest value or the smallest value in a set of numbers is also a descriptive statistic.
exponential smoothing: Exponential smoothing calculates the moving average but weights the values included in the moving average calculations so that more recent values have a bigger effect.
inferential statistics: Inferential statistics are based on the very useful, intuitive idea that if you look at a sample of values from a population and the sample is representative and large enough, you can draw conclusions about the population based on characteristics of the sample.
kurtosis: This is a measure of the tails in a distribution of values.
median: The median is the middle value in a set of values. Half of the values fall below the median, and half of the values fall above the median.
mode: Mode is the most common value in a set.
moving average: A moving average is calculated using only a specified set of values, such as an average based on just the last three values.
normal distribution: Also known as a Gaussian distribution, normal distribution is the infamous bell curve.
p-value: A p-value is the level of significance, or probability.
regression analysis: Regression analysis involves plotting pairs of independent and dependent variables in an XY chart and then finding a linear or exponential equation that best describes the plotted data.
skewness: This is a measure of the symmetry of a distribution of values.
standard deviation: A standard deviation describes dispersion about the data set’s mean. You can kind of think of a standard deviation as an average deviation from the mean.
variance: A variance describes dispersion about the data set’s mean. The variance is the square of the standard deviation; the standard deviation is the square root of the variance.
z-value: This is the distance between a value and the mean in terms of standard deviations.