## Articles From Deborah J. Rumsey

### Filter Results

Cheat Sheet / Updated 01-19-2023

Statistics involves a lot of data analysis, and analysis is built with math notation and formulas — but never fear, your cheat sheet is here to help you organize, understand, and remember the notation and formulas so that when it comes to putting them into practice or to the test, you’re ahead of the game!

View Cheat SheetArticle / Updated 10-06-2022

If you know the standard deviation for a population, then you can calculate a confidence interval (CI) for the mean, or average, of that population. When a statistical characteristic that’s being measured (such as income, IQ, price, height, quantity, or weight) is numerical, most people want to estimate the mean (average) value for the population. You estimate the population mean, μ, by using a sample mean, x̄, plus or minus a margin of error. The result is called a confidence interval for the population mean, μ. When the population standard deviation is known, the formula for a confidence interval (CI) for a population mean is x̄ ± z* σ/√n, where x̄ is the sample mean, σ is the population standard deviation, n is the sample size, and z* represents the appropriate z*-value from the standard normal distribution for your desired confidence level. z*-values for Various Confidence Levels Confidence Level z*-value 80% 1.28 90% 1.645 (by convention) 95% 1.96 98% 2.33 99% 2.58 The above table shows values of z* for the given confidence levels. Note that these values are taken from the standard normal (Z-) distribution. The area between each z* value and the negative of that z* value is the confidence percentage (approximately). For example, the area between z*=1.28 and z=-1.28 is approximately 0.80. Hence this chart can be expanded to other confidence percentages as well. The chart shows only the confidence percentages most commonly used. In this case, the data either have to come from a normal distribution, or if not, then n has to be large enough (at least 30 or so) in order for the Central Limit Theorem to be applied, allowing you to use z*-values in the formula. To calculate a CI for the population mean (average), under these conditions, do the following: Determine the confidence level and find the appropriate z*-value. Refer to the above table. Find the sample mean (x̄) for the sample size (n). Note: The population standard deviation is assumed to be a known value, σ. Multiply z* times σ and divide that by the square root of n. This calculation gives you the margin of error. Take x̄ plus or minus the margin of error to obtain the CI. The lower end of the CI is x̄ minus the margin of error, whereas the upper end of the CI is x̄ plus the margin of error. For example, suppose you work for the Department of Natural Resources and you want to estimate, with 95 percent confidence, the mean (average) length of all walleye fingerlings in a fish hatchery pond. Because you want a 95 percent confidence interval, your z*-value is 1.96. Suppose you take a random sample of 100 fingerlings and determine that the average length is 7.5 inches; assume the population standard deviation is 2.3 inches. This means x̄ = 7.5, σ = 2.3, and n = 100. Multiply 1.96 times 2.3 divided by the square root of 100 (which is 10). The margin of error is, therefore, ± 1.96(2.3/10) = 1.96*0.23 = 0.45 inches. Your 95 percent confidence interval for the mean length of walleye fingerlings in this fish hatchery pond is 7.5 inches ± 0.45 inches. (The lower end of the interval is 7.5 – 0.45 = 7.05 inches; the upper end is 7.5 + 0.45 = 7.95 inches.) After you calculate a confidence interval, make sure you always interpret it in words a non-statistician would understand. That is, talk about the results in terms of what the person in the problem is trying to find out — statisticians call this interpreting the results “in the context of the problem.” In this example you can say: “With 95 percent confidence, the average length of walleye fingerlings in this entire fish hatchery pond is between 7.05 and 7.95 inches, based on my sample data.” (Always be sure to include appropriate units.)

View ArticleArticle / Updated 09-22-2022

You can calculate a confidence interval (CI) for the mean, or average, of a population even if the standard deviation is unknown or the sample size is small. When a statistical characteristic that’s being measured (such as income, IQ, price, height, quantity, or weight) is numerical, most people want to estimate the mean (average) value for the population. You estimate the population mean, by using a sample mean, plus or minus a margin of error. The result is called a confidence interval for the population mean, In many situations, you don’t know so you estimate it with the sample standard deviation, s. But if the sample size is small (less than 30), and you can’t be sure your data came from a normal distribution. (In the latter case, the Central Limit Theorem can’t be used.) In either situation, you can’t use a z*-value from the standard normal (Z-) distribution as your critical value anymore; you have to use a larger critical value than that, because of not knowing what is and/or having less data. The formula for a confidence interval for one population mean in this case is is the critical t*-value from the t-distribution with n – 1 degrees of freedom (where n is the sample size). The t-table The t*-values for common confidence levels are found using the last row of the t-table above. The t-distribution has a shape similar to the Z-distribution except it’s flatter and more spread out. For small values of n and a specific confidence level, the critical values on the t-distribution are larger than on the Z-distribution, so when you use the critical values from the t-distribution, the margin of error for your confidence interval will be wider. As the values of n get larger, the t*-values are closer to z*-values. To calculate a CI for the population mean (average), under these conditions, do the following: Determine the confidence level and degrees of freedom and then find the appropriate t*-value. Refer to the preceding t-table. Find the sample mean and the sample standard deviation (s) for the sample. Multiply t* times s and divide that by the square root of n. This calculation gives you the margin of error. Take plus or minus the margin of error to obtain the CI. The lower end of the CI is minus the margin of error, whereas the upper end of the CI is plus the margin of error. Here's an example of how this works For example, suppose you work for the Department of Natural Resources and you want to estimate, with 95 percent confidence, the mean (average) length of all walleye fingerlings in a fish hatchery pond. You take a random sample of 10 fingerlings and determine that the average length is 7.5 inches and the sample standard deviation is 2.3 inches. Because you want a 95 percent confidence interval, you determine your t*-value as follows: The t*-value comes from a t-distribution with 10 – 1 = 9 degrees of freedom. This t*-value is found by looking at the t-table. Look in the last row where the confidence levels are located, and find the confidence level of 95 percent; this marks the column you need. Then find the row corresponding to df = 9. Intersect the row and column, and you find t* = 2.262. This is the t*-value for a 95 percent confidence interval for the mean with a sample size of 10. (Notice this is larger than the z*-value, which would be 1.96 for the same confidence interval.) You know that the average length is 7.5 inches, the sample standard deviation is 2.3 inches, and the sample size is 10. This means Multiply 2.262 times 2.3 divided by the square root of 10. The margin of error is, therefore, Your 95 percent confidence interval for the mean length of all walleye fingerlings in this fish hatchery pond is (The lower end of the interval is 7.5 – 1.645 = 5.86 inches; the upper end is 7.5 + 1.645 = 9.15 inches.) Notice this confidence interval is wider than it would be for a large sample size. In addition to having a larger critical value (t* versus z*), the smaller sample size increases the margin of error, because n is in its denominator. With a smaller sample size, you don’t have as much information to “guess” at the population mean. Hence keeping with 95 percent confidence, you need a wider interval than you would have needed with a larger sample size in order to be 95 percent confident that the population mean falls in your interval. Now, say it in a way others can understand After you calculate a confidence interval, make sure you always interpret it in words a non-statistician would understand. That is, talk about the results in terms of what the person in the problem is trying to find out — statisticians call this interpreting the results “in the context of the problem.” In this example you can say: “With 95 percent confidence, the average length of walleye fingerlings in this entire fish hatchery pond is between 5.86 and 9.15 inches, based on my sample data.” (Always be sure to include appropriate units.)

View ArticleArticle / Updated 09-22-2022

You can find a confidence interval (CI) for the difference between the means, or averages, of two population samples, even if the population standard deviations are unknown and/or the sample sizes are small. The goal of many statistical surveys and studies is to compare two populations, such as men versus women, low versus high income families, and Republicans versus Democrats. When the characteristic being compared is numerical (for example, height, weight, or income), the object of interest is the amount of difference in the means (averages) for the two populations. For example, you may want to compare the difference in average age of Republicans versus Democrats, or the difference in average incomes of men versus women. You estimate the difference between two population means, by taking a sample from each population (say, sample 1 and sample 2) and using the difference of the two sample means plus or minus a margin of error. The result is a confidence interval for the difference of two population means, There are two situations where you cannot use z* when computing the confidence interval. The first of which is if you not know In this case you need to estimate them with the sample standard deviations, s1 and s2. The second situation is when the sample sizes are small (less than 30). In this case you can’t be sure whether your data came from a normal distribution. In either of these situations, a confidence interval for the difference in the two population means is where t* is the critical value from the t-distribution with n1 + n2 – 2 degrees of freedom; n1 and n2 are the two sample sizes, respectively; and s1 and s2 are the two sample standard deviations. This t*-value is found on the following t-table by intersecting the row for df = n1 + n2 – 2 with the column for the confidence level you need, as indicated by looking at the last row of the table. To calculate a CI for the difference between two population means, do the following: Determine the confidence level and degrees of freedom (n1 + n2 – 2) and find the appropriate t*-value. Refer to the above table. Identify Identify Find the difference, between the sample means. Calculate the confidence interval using the equation, Suppose you want to estimate with 95% confidence the difference between the mean (average) lengths of the cobs of two varieties of sweet corn (allowing them to grow the same number of days under the same conditions). Call the two varieties Corn-e-stats (group 1) and Stats-o-sweet (group 2). Assume that you don’t know the population standard deviations, so you use the sample standard deviations instead — suppose they turn out to be s1 = 0.40 and s2 = 0.50 inches, respectively. Suppose the sample sizes, n1 and n2, are each only 15. To calculate the CI, you first need to find the t*-value on the t-distribution with (15 + 15 – 2) = 28 degrees of freedom. Using the above t-table, you look at the row for 28 degrees of freedom and the column representing a confidence level of 95% (see the labels on the last row of the table); intersect them and you see t*28 = 2.048. For both groups, you took random sample of 15 cobs, with the Corn-e-stats variety averaging 8.5 inches, and Stats-o-sweet 7.5 inches. So the information you have is: The difference between the sample means is 8.5 – 7.5 = +1 inch. This means the average for Corn-e-stats minus the average for Stats-o-sweet is positive, making Corn-e-stats the larger of the two varieties, in terms of this sample. Is that difference enough to generalize to the entire population, though? That’s what this confidence interval is going to help you decide. Using the rest of the information you are given, find the confidence interval for the difference in mean cob length for the two brands: Your 95% confidence interval for the difference between the average lengths for these two varieties of sweet corn is 1 inch, plus or minus 0.9273 inches. (The lower end of the interval is 1 – 0.9273 = 0. 0727 inches; the upper end is 1 + 0. 9273 = 1. 9273 inches.) Notice all the values in this interval are positive. That means Corn-e-stats is estimated to be longer than Stats-o-sweet, based on your data. The temptation is to say, “Well, I knew Corn-e-stats corn was longer because its sample mean was 8.5 inches and Stat-o-sweet was only 7.5 inches on average. Why do I even need a confidence interval?” All those two numbers tell you is something about those 30 ears of corn sampled. You also need to factor in variation using the margin of error to be able to say something about the entire populations of corn.

View ArticleArticle / Updated 08-10-2022

When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that’s on trial, in essence, is called the null hypothesis. The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be untrue. The evidence in the trial is your data and the statistics that go along with it. All hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data are telling you about the population). The p-value is a number between 0 and 1 and interpreted in the following way: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis. p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so your readers can draw their own conclusions. Hypothesis test example For example, suppose a pizza place claims their delivery times are 30 minutes or less on average but you think it’s more than that. You conduct a hypothesis test because you believe the null hypothesis, Ho, that the mean delivery time is 30 minutes max, is incorrect. Your alternative hypothesis (Ha) is that the mean time is greater than 30 minutes. You randomly sample some delivery times and run the data through the hypothesis test, and your p-value turns out to be 0.001, which is much less than 0.05. In real terms, there is a probability of 0.05 that you will mistakenly reject the pizza place’s claim that their delivery time is less than or equal to 30 minutes. Since typically we are willing to reject the null hypothesis when this probability is less than 0.05, you conclude that the pizza place is wrong; their delivery times are in fact more than 30 minutes on average, and you want to know what they’re gonna do about it! (Of course, you could be wrong by having sampled an unusually high number of late pizza deliveries just by chance.)

View ArticleArticle / Updated 08-10-2022

In statistics, when you test a hypothesis about a population, you find a p-value and use your test statistic to decide whether to reject the null hypothesis, H0. A p-value is a probability associated with your critical value. The critical value depends on the probability you are allowing for a Type I error. It measures the chance of getting results at least as strong as yours if the claim (H0) were true. The following figure shows the locations of a test statistic and their corresponding conclusions. Note that if the alternative hypothesis is the less-than alternative, you reject H0 only if the test statistic falls in the left tail of the distribution (below –2). Similarly, if Ha is the greater-than alternative, you reject H0 only if the test statistic falls in the right tail (above 2). To find the p-value for your test statistic: Look up your test statistic on the appropriate distribution — in this case, on the standard normal (Z-) distribution (see the following Z-tables). Find the probability that Z is beyond (more extreme than) your test statistic: If Ha contains a less-than alternative, find the probability that Z is less than your test statistic (that is, look up your test statistic on the Z-table and find its corresponding probability). This is the p-value. (Note: In this case, your test statistic is usually negative.) If Ha contains a greater-than alternative, find the probability that Z is greater than your test statistic (look up your test statistic on the Z-table, find its corresponding probability, and subtract it from one). The result is your p-value. (Note: In this case, your test statistic is usually positive.) If Ha contains a not-equal-to alternative, find the probability that Z is beyond your test statistic and double it. There are two cases: If your test statistic is negative, first find the probability that Z is less than your test statistic (look up your test statistic on the Z-table and find its corresponding probability). Then double this probability to get the p-value. If your test statistic is positive, first find the probability that Z is greater than your test statistic (look up your test statistic on the Z-table, find its corresponding probability, and subtract it from one). Then double this result to get the p-value. Suppose you are testing a claim that the percentage of all women with varicose veins is 25%, and your sample of 100 women had 20% with varicose veins. Then the sample proportion p=0.20. The standard error for your sample percentage is the square root of p(1-p)/n which equals 0.04 or 4%. You find the test statistic by taking the proportion in the sample with varicose veins, 0.20, subtracting the claimed proportion of all women with varicose veins, 0.25, and then dividing the result by the standard error, 0.04. These calculations give you a test statistic (standard score) of –0.05 divided by 0.04 = –1.25. This tells you that your sample results and the population claim in H0 are 1.25 standard errors apart; in particular, your sample results are 1.25 standard errors below the claim. When testing H0: p = 0.25 versus Ha: p < 0.25, you find that the p-value of -1.25 by finding the probability that Z is less than -1.25. When you look this number up on the above Z-table, you find a probability of 0.1056 of Z being less than this value. Note: If you had been testing the two-sided alternative, the p-value would be 2 ∗ 0.1056, or 0.2112. If the results are likely to have occurred under the claim, then you fail to reject H0 (like a jury decides not guilty). If the results are unlikely to have occurred under the claim, then you reject H0 (like a jury decides guilty).

View ArticleArticle / Updated 08-08-2022

You can use the z-table to find a full set of "less-than" probabilities for a wide range of z-values. To use the z-table to find probabilities for a statistical sample with a standard normal (Z-) distribution, follow the steps below. Using the Z-table Go to the row that represents the ones digit and the first digit after the decimal point (the tenths digit) of your z-value. Go to the column that represents the second digit after the decimal point (the hundredths digit) of your z-value. Intersect the row and column from Steps 1 and 2. This result represents p(Z < z), the probability that the random variable Z is less than the value Z (also known as the percentage of z-values that are less than the given z-value ). For example, suppose you want to find p(Z < 2.13). Using the z-table below, find the row for 2.1 and the column for 0.03. Intersect that row and column to find the probability: 0.9834. Therefore p(Z < 2.13) = 0.9834. Noting that the total area under any normal curve (including the standardized normal curve) is 1, it follows that p(Z < 2.13) + p(Z > 2.13) =1. Therefore, p(Z > 2.13) = 1 – p(Z < 2.13) which equals 1 – 0.9834 which equals 0.0166. Symmetry in the distribution Suppose you want to look for p(Z < –2.13). You find the row for –2.1 and the column for 0.03. Intersect the row and column and you find 0.0166; that means p(Z < –2.13)=0.0166. Observe that this happens to equal p(Z>+2.13). The reason for this is because the normal distribution is symmetric. So the tail of the curve below –2.13 representing p(Z < –2.13) looks exactly like the tail above 2.13 representing p(Z > +2.13).

View ArticleCheat Sheet / Updated 02-25-2022

This cheat sheet is for you to use as a quick resource for finding important basic statistical formulas such as mean, standard deviation, and Z-values; important and always useful probability definitions such as independence and rules such as the multiplication rule and the addition rule; and 10 quick ways to spot statistical mistakes either in your own work, or out there in the media as a consumer of statistical information.

View Cheat SheetCheat Sheet / Updated 02-23-2022

Statistics II elaborates on Statistics I and moves into new territories, including multiple regression, analysis of variance (ANOVA), Chi-square tests, nonparametric procedures, and other key topics. Knowing which data analysis to use and why is important, as is familiarity with computer output if you want your numbers to give you dependable results.

View Cheat SheetArticle / Updated 12-28-2021

In statistics, you can easily find probabilities for a sample mean if it has a normal distribution. Even if it doesn’t have a normal distribution, or the distribution is not known, you can find probabilities if the sample size, n, is large enough. The normal distribution is a very friendly distribution that has a table for finding probabilities and anything else you need. For example, you can find probabilities for by converting the to a z-value and finding probabilities using the Z-table (see below). The general conversion formula from Substituting the appropriate values of the mean and standard error of the conversion formula becomes: Don’t forget to divide by the square root of n in the denominator of z. Always divide by the square root of n when the question refers to the average of the x-values. For example, suppose X is the time it takes a randomly chosen clerical worker in an office to type and send a standard letter of recommendation. Suppose X has a normal distribution, and assume the mean is 10.5 minutes and the standard deviation 3 minutes. You take a random sample of 50 clerical workers and measure their times. What is the chance that their average time is less than 9.5 minutes? This question translates to finding As X has a normal distribution to start with, you know also has an exact (not approximate) normal distribution. Converting to z, you get: So you want P(Z < –2.36). Using the above Z-table, you find that P(Z < –2.36)=0.0091. So the probability that a random sample of 50 clerical workers average less than 9.5 minutes to complete this task is 0.91% (very small). How do you find probabilities for if X is not normal, or unknown? As a result of the Central Limit Theorem (CLT), the distribution of X can be non-normal or even unknown and as long as n is large enough, you can still find approximate probabilities for using the standard normal (Z-)distribution and the process described above. That is, convert to a z-value and find approximate probabilities using the Z-table. When you use the CLT to find a probability for (that is, when the distribution of X is not normal or is unknown), be sure to say that your answer is an approximation. You also want to say the approximate answer should be close because you’ve got a large enough n to use the CLT. (If n is not large enough for the CLT, you can use the t-distribution in many cases.)

View Article