Chapter 6 Probability Distributions
In this chapter and the next we build a bridge between the one sample we almost always have to work with and the population from which our sample was drawn. This is a crucial stage of our statistics knowledge-building for one major reason. Specifically, it is always easy to guess (correctly) what the sample will look like if we know what the population looks like. For example, imagine I show you a jar filled with 100 black marbles. I then blindfold you and ask you to pick 20 marbles. What color are they? Of course, all 20 marbles are black since that is the population. I then show you another jar, this one with 60 black, 20 white, and 20 red marbles. I then blindfold you again and ask you to pick 20 marbles and guess their colors. You might guess that the sample will show the same mix as the population: You might have 12 black, 4 white, and 4 red marbles. You could have a different mix but 12 black, 4 white and 4 red would be a reasonable guess.
Now I blindfold you, place a jar before you and ask you to pick 20 marbles without telling you the distribution of colored marbles in the jar. What would you expect in your sample? You have no clue and nor should you since the population is unknown. This is what you face in statistics all the time: A population beyond your purview, flying blind so to speak. How then can you really generalize from your sample to the population? Only by way of the theory of sampling distributions
, by knowing how much drift to expect between your sample and the population it represents. Understanding the finer points of the theory of sampling distributions is our goal in this chapter. To ease our understanding we start by grasping what is meant by random variables
that are discrete
versus continuous
, looking at some discrete versus continuous probability distributions
, and then working our way to the concepts of a standard error
, the central limit theorem
, and confidence intervals
.
6.1 Random Variables
A random variable
is a numerical description of the outcome of an experiment. A discrete random variable
assumes discrete values while a continuous random variable
may assume any value in an interval or collection of intervals.
Random Variable | Possible Values |
---|---|
No. of defective iPhones | (0, 1, 2, 3,…, 49) |
Sex of car buyer | (Male, Female) |
No. of Mountain Lions seen | (0, 1, 2, 3, …, 419 |
Gene length (in nucleotides) | (60, …, 100000) |
Random Variable | Possible Values |
---|---|
Gene length (in nucleotides) | 60 <= x <= 100000 |
Spending per week | 0 <= x <= infinity |
Travel times to CMH | 55.4 <= x <= 118.5 |
Undulation rates of gliding snakes | 0 <= x <= 1.9 |
Petal length of the virginica Iris | 1 <= x <= 6.7 |
6.2 Probability Distributions
A probability distribution is a list of the probabilities of all mutually exclusive events (aka outcomes) of a random trial. While both discrete and continuous variables have probability distributions, discrete probability distributions are easier to understand so let us begin with them.
6.2.1 Discrete Probability Distributions
Suppose we have an experiment whose outcome \((x)\) depends on chance (i.e., is a random variable
) and the sample space of the experiment is, as usual, the set of all possible outcomes of the experiment. If the sample space is either finite or countably infinite, the random variable is said to be discrete
. A probability distribution
of a discrete random variable (\(x\)) describes how probabilities are distributed over the values of the random variable, and is denoted by \(f(x)\). Discrete probability functions must meet two conditions …
The table below shows the number of persons waiting for service at a local office of the state’s Department of Motor Vehicles when the first employee walks up to open the counter. Let us assume that these data were gathered for a random stretch of 300 business days, and span two years.
No. waiting (x) | Frequency (f) | x * f(x) |
---|---|---|
0 | 54 | 0.18 |
1 | 117 | 0.39 |
2 | 72 | 0.24 |
3 | 42 | 0.14 |
4 | 12 | 0.04 |
5 | 3 | 0.01 |
Note that \(\sum f = 300\) and \(\sum x*f(x) = 1.00\)
If you scan the table or the bar chart it is obvious what you should expect if you were the individual manning the counter. On any given day you should expect to see 1 person waiting to be served. Why? Because this is what seems to happen most often (117 out of the 300 days you kept logs).
What if you were rolling dice instead?
Note, again, that the first row with columns 1, 2, 3, 4, 5, and 6 are the numbers that show up on a roll of Dice 1 and the first column with rows 1, 2, 3, 4, 5, and 6 are the numbers that show up on a roll of Dice 2.
You have seen Table 6.4 before so it should be easy to figure out what would be the most likely sum of the numbers showing up on the faces of two dice you roll: \(7\). Why? Because that happens most often.
1 | 2 | 3 | 4 | 5 | 6 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
3 | 4 | 5 | 6 | 7 | 8 | 9 |
4 | 5 | 6 | 7 | 8 | 9 | 10 |
5 | 6 | 7 | 8 | 9 | 10 | 11 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
6.2.1.1 The Binomial Distribution
Let us take up another example, this time of flipping a coin. Say you flip a coin 10 times. So long as I know that it is a fair coin, what should I expect in terms of the distribution of the number of heads I would see? One way to answer this question would be to setup all possible outcomes such as no heads, 1 head, 2 heads, 3 heads, and so on but this would be a very tedious way of doing things. Instead, we rely on the binomial distribution
which characterizes the distribution of binary outcomes, with the outcome of interest being tagged as a success and the other category tagged as a failure. Note that success could mean a patient survives (versus dies), a candidate for political office wins (versus loses), a tax audit catches a tax evader (versus fails to detect evasion), the job-training program works (versus it does not), and so on. The binomial distribution is premised upon some assumptions:
- The number of trials (\(n\)) is fixed
- Each trial is independent of all other trials
- Only two mutually exclusive and mutually exhaustive outcomes likely to occur in any given trial, with one outcome defined as
success
and the other defined asfailure
- The probability of observing a success (\(p\)) does not vary across trials. Because there are only two outcomes, this means the probability of observing a failure (\(q\)) also does not vary across trials. Further, \(p = 1 - q\) and \(q = 1 - p\)
Mathematically, the probability of observing \(x\) successes in \(n\) trials of a binomial process is given by
\[\begin{eqnarray*} P\left[x \text{ successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \text{where } \binom{n}{x} = \dfrac{n!}{x!(n-x)!} \text{ and } n! = n \times (n-1) \times (n-2) \times \cdots \times 2 \times 1 \end{eqnarray*}\]
Let us understand this distribution with a simple example. If I toss a coin 2 times, what is the probability of getting exactly 1 head? Let success be the variable \(x\) and then, in our present example, \(x=1\). Now we know that for unbiased coins there is a 50:50 chance of getting heads on any single toss, i.e., \(p(Heads)=0.50\). We are also conducting \(n=2\) independent trials since we are flipping a coin twice and what happens on the first toss has no impact on what happens in the second toss.
How many outcomes are likely in our 2 independent trials? We know this to be \((2)^{2} = 4\) … these are \(\left[HH, HT, TH, TT \right]\). In how many ways can we get 1 head out of 2 tosses? … \(\left[HT, TH \right]\). So the probability of getting exactly 1 head in 2 tosses is \(\dfrac{2}{4} = 0.5\). This was easy to calculate manually, by spelling out all possible outcomes and then seeing how many of these outcomes match our definition of success. The binomial distribution would have given you the same answer:
\[\begin{eqnarray*} P\left[x \text{ Successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \therefore P\left[1 \text{ Success}\right] = \binom{2}{1}(0.50)^{1}\left(1 - 0.50\right)^{2-1} = \binom{2}{1}(0.50)^{1}\left(0.50\right)^{1} \\ \binom{2}{1} = \dfrac{2\times 1}{\left(1\right) \left(1 \right)} = 2 \\ \therefore, P\left[1 \text{ Success}\right] = (2) \times (0.5) \times (0.5) = 0.50 \end{eqnarray*}\]
If I toss a coin 3 times, what is the probability of getting exactly 1 head? Let \(x=1\). We know for unbiased coins \(p(Heads)=0.50\). We are also conducting \(n=3\) independent trials. How many outcomes are likely in 3 independent trials? We know this to be \((2)^{3} = 8\) … these are \(\left[HHH, HHT, HTH, HTT, TTT, TTH, THT, THH \right]\). In how many ways can we get 1 Head out of 3 tosses? … \(\left[HTT, THT, TTH \right]\). So the probability of getting exactly 1 Head in 3 tosses is \(\dfrac{3}{8} = 0.375\) Using the binomial distribution,
\[\begin{eqnarray*} P\left[x \text{ Successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \therefore P\left[1 \text{ Success}\right] = \binom{3}{1}(0.50)^{1}\left(1 - 0.50\right)^{3-1} = \binom{3}{1}(0.50)^{1}\left(0.50\right)^{2} \\ \binom{3}{1} = \dfrac{3 \times 2\times 1}{\left(1\right) \left(2 \times 1 \right)} = 3 \\ \therefore, P\left[1 \text{ Success}\right] = (3) \times (0.5) \times (0.25) = 0.375 \end{eqnarray*}\]
If I had to now answer the question we began with – what would the distribution of heads look like if I flipped a coin 10 times? – and I went the manual route I would be doing a lot of tedious calculations. Instead, I can use the binomial to calculate the probability of 0 heads, 1 head, 2 heads, and so on. For example, for 0 heads it would be
\[\begin{eqnarray*} P\left[x \text{ Successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \therefore P\left[0 \text{ Success}\right] = \binom{10}{0}(0.50)^{0}\left(1 - 0.50\right)^{10-0} = \binom{10}{0}(0.50)^{0}\left(0.50\right)^{10} \\ \binom{10}{0} = \dfrac{10 \times 9 \times 8 \times \ldots\times 2\times 1}{\left(0\right) \left(10 \times 9 \times 8 \times \ldots \times 2 \times 1 \right)} = 1 \\ \therefore, P\left[0 \text{ Successes}\right] = (1) \times (0.5) \times (0.0009765625) = 0.0009765625 \end{eqnarray*}\]
For 1 head it would be
\[\begin{eqnarray*} P\left[x \text{ Successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \therefore P\left[1 \text{ Success}\right] = \binom{10}{1}(0.50)^{1}\left(1 - 0.50\right)^{10-1} = \binom{10}{1}(0.50)^{1}\left(0.50\right)^{9} \\ \binom{10}{1} = \dfrac{10 \times 9 \times 8 \times \ldots\times 2\times 1}{\left(1\right) \left(9 \times 8 \times \ldots \times 2 \times 1 \right)} = 10 \\ \therefore, P\left[1 \text{ Success}\right] = (10) \times (0.5) \times (0.001953125) = 0.009765625 \end{eqnarray*}\]
and so on. The rest of the calculations are done similarly and listed in the table below:
No. of Heads | Relative frequency f(x) |
---|---|
0 | 0.0010 |
1 | 0.0098 |
2 | 0.0439 |
3 | 0.1172 |
4 | 0.2051 |
5 | 0.2461 |
6 | 0.2051 |
7 | 0.1172 |
8 | 0.0439 |
9 | 0.0098 |
10 | 0.0010 |
These relative frequencies (the probabilities) can also be seen in the plot below.
Now the question is, if you flip a coin 10 times, how many heads should you expect to see, on average? In the language of statistics we speak of probabilistic outcomes in terms of expected value
where the expected value of a random variable is a measure of the central tendency
of the random variable, and is given by \(E(x) = \mu = \Sigma xf(x)\). For our 10 flips, the expected value would be …
No. of Heads (x) | Relative Frequency (f(x)) | Expected Value (x * fx()) |
---|---|---|
0 | 0.0010 | 0.0000000 |
1 | 0.0098 | 0.0097656 |
2 | 0.0439 | 0.0878906 |
3 | 0.1172 | 0.3515625 |
4 | 0.2051 | 0.8203125 |
5 | 0.2461 | 1.2304688 |
6 | 0.2051 | 1.2304687 |
7 | 0.1172 | 0.8203125 |
8 | 0.0439 | 0.3515625 |
9 | 0.0098 | 0.0878906 |
10 | 0.0010 | 0.0097656 |
Makes sense, doesn’t it? On average you would expect to see 5 heads in 10 flips of an unbiased coin. However, as the table makes quite clear, there is a small but non-zero probability of ending up with fewer or more heads as well.
How does any of this apply to the real world? In more ways than one can imagine. For example, based on more than a century of birth statistics we know that the next birth will be a boy is 0.512 and hence that it will be a girl is 0.488. So if you wanted to predict the sex of the child born next the better bet would be that it is a boy. Let us put the binomial distribution into action with a few examples.
Example 1
Your city council has been charged with sex discrimination in hiring. In the last fiscal year your city council hired only 3 women out of a candidate pool numbering 12 qualified applicants. On average, some 40% of women tend to make up the qualified applicant pool for these positions. If the city council was not discriminating based on the applicant’s sex, what is the probability of the city hiring three or fewer women?
We know the number of experiments/trials is \(n=12\). We also know that the number of successes, defined as the number of women hired, is \(x=3\). The probability of success, here the probability of hiring a qualified woman applicant, is known to be \(p=0.40\). Using the binomial theorem, then, we can calculate the probability of hiring no qualified woman applicant is given by
\[\begin{eqnarray*} P\left[x \text{ Successes}\right] = \binom{n}{x}p^{x}\left(1 - p\right)^{n-x} \\ \therefore P\left[0 \text{ Successes}\right] = \binom{12}{0}(0.40)^{0}\left(1 - 0.40\right)^{12-0} = \binom{12}{0}(0.40)^{0}\left(0.60\right)^{12} \\ \binom{12}{0} = \dfrac{12 \times 10 \times 9 \times \ldots\times 2\times 1}{\left(0\right) \left(12 \times 10 \times 9 \times \ldots \times 2 \times 1 \right)} = 1 \\ \therefore, P\left[0 \text{ Successes}\right] = (1) \times (1) \times (0.002176782) = 0.002176782 \end{eqnarray*}\]
Similarly, calculating the probability of 1, 2 and 3 successes yields estimates of 0.01741426, 0.06385228 and 0.141894, respectively. Thus, the probability of seeing 3 or fewer women hired works out to \(0.002176782 + 0.01741426 + 0.06385228 + 0.141894 = 0.2253373\). Formally, then, \(P(x \leq 3) = 0.2253373\).
Note that we could have used the online binomial calculator to do these calculations for us, saving precious time.
Example 2
Seaman David Brady is one of 16 seamen in Petty Officer Rickels’ unit. Daily, four seamen are assigned to chip paint while the rest are assigned to screen movies suitable for viewing on the naval base. Assume that duty assignments are independent across days. Seamen Brady believes Rickels does not like him because he has ended up chipping paint 16 of the past 20 days.
- What is the probability that this would happen if Rickels was not discriminating against Brady?
The probability of being assigned to chip paint is 0.25 for any individual. Setting \(p=0.25; n = 20; x = 16\) in the calculator yields \(P(x=16)\) \(< 0.000001\).
- Do the data suggest Rickels is discriminating against Brady? Why do you conclude as you do?
Yes, given the almost zero probability of this happening by chance it seems as if Brady is being picked on by Rickels.
6.2.1.2 The Poisson Distribution
We can also track hurricanes, cyclones, earthquakes, etc. and measure their intensity. Or perhaps you are interested in figuring out the probability of a Shark Attack in a given year in a given place. You could look at the historical data and figure out the average number of attacks in this place per year. These data are available here.
Say I am only interested in incidents in the USA and Australia over the 1900-2017 period and want to know the number of incidents per year per country. This evident in the plot that follows.
The average number of incidents per country turns out to be almost 10 for Australia and about 17 for the USA. This would tell us that in any given year we should expect, on average, these many shark incidents in each country, respectively. Of course, something has gone on since about 2000 as reflected by the surge in the number of incidents. If we calculated the averages just for the 2000-2017 period we would likely see higher averages than for the pre-2000 period. In passing, note the surprising results both in the plot and in these means: Australia has fewer shark incidents, on average, than does the USA, but the news media usually carry stories of attacks in Australian waters.
As it turns out, the variable of interest – number of shark incidents per country per year – is a discrete variable that can never drop below 0 but can assume any value \(x = 0, 1, 2, 3, 4, 5, \ldots, n\). Variables such as these are called count variables and belong to a specific distribution: The Poisson distribution
. Mathematically, the Poisson probability distribution is expressed as:
\[f(x) = \dfrac{e^{-x}\lambda^{x}}{x!}\]
where \(x = 0, 1, 2, 3, \ldots\), \(\lambda > 0\) is the mean and variance of the distribution, and \(e=2.71828\).
Several discrete random variables are assumed to be Poisson distributed – number of highway fatalities, number of persons in a household, number of patient visits to the Emergency Room, number of traffic stops, number of terrorist attacks, number of hurricanes rated Category 5 on the Saffir-Simpson Hurricane Wind Scale, the number of citizen complaints filed with the City Clerk, the number of hazardous waste sites per county, the number of parolees violating the conditions of their parole per month, and so on. Say, for example, that on average 10 complaints are filed per month with the City Clerk. If you pick a month at random, what would be the probability of seeing no complaint?; 1 complaint?; 2 complaints; \(\ldots\) 20 complaints? These turn out to be:
No. of Complaints | Probability |
---|---|
0 | 0.0067379 |
1 | 0.0336897 |
2 | 0.0842243 |
3 | 0.1403739 |
4 | 0.1754674 |
5 | 0.1754674 |
6 | 0.1462228 |
7 | 0.1044449 |
8 | 0.0652780 |
9 | 0.0362656 |
10 | 0.0181328 |
11 | 0.0082422 |
12 | 0.0034342 |
13 | 0.0013209 |
14 | 0.0004717 |
15 | 0.0001572 |
Dispersion versus Clumping
If a discrete random variable is indeed Poisson generated, the mean should equal the variance. In passing, note that the Poisson distribution is premised on the following assumptions:
- The probability \((p)\) of observing a success in a small space or interval of time is approximately proportional to the area of the space or the length of the time interval
- The probability of two successes occurring in the same narrow interval of time or in a small space is negligible
- The probability \((p)\) of observing a success in a small space or interval of time does not vary across space or time
- The probability \((p)\) of observing a success in a small space or interval of time is independent of the probability \((p)\) of observing a success in the next small space or interval of time
If the mean exceeds the variance then we have a dispersed
variable (i.e., events occur farther apart in space/time than would be expected by chance). An example of dispersion might be observed in the behavior of territorial animals such as lions and tigers that mark their own territory and guard it from competitors. So if you see one “success” (say, a lion), the probability of seeing another nearby/soon decreases. If the variance exceeds the mean we have clumping
(i.e., events occur closer together in space/time than would be expected by chance). An example of clumping might be social animals (that herd together) and contagious diseases (outbreaks will occur within a spatial group). So if you see one “success” (say, a wildebeest), the probability of seeing another nearby/soon increases.
Before we move on, note what happens to the distribution as the mean \((\lambda)\) increases. The more commonly the event occurs, the more “normal” does the distribution look. The lower is the mean, the more likely you are to see a positively skewed
distribution, and the higher the mean the more likely you will see a negatively skewed
distribution.
When \(n \rightarrow \infty\) and \(p \rightarrow 0\), the Poisson distribution approximates the Binomial distribution. Likewise, as \(\lambda \rightarrow \infty\), the Poisson distribution approximates the Normal distribution. This pattern is visible in the plots that follow.
Poisson distributions have been used to tackle some very fascinating problems in both academic and applied research, including, but not limited to, traffic signal design and operation, authorship of the Federalist Papers, highway traffic, radioactive decay, German air-raids on London during World War II, and positions of glowworms on the ceiling of the Waitomo cave in New Zealand. Two practical applications of the Poisson distribution follow.
Example 1
The mean number of crime reports filed per hour during Halloween by the Athens City Police Department is 0.833.
- What is the probability that in a 24 hour period the ACPD will see 10 or fewer crime reports filed?
\[f(x) = \dfrac{e^{-x}\lambda^{x}}{x!} = \dfrac{e^{-10}0.833^{10}}{10!}\]
Let us use the online calculator for the poisson distribution. We have \(\lambda=19.992\) since the mean for a 24-hour period would be \(= 0.833 \times 24 = 19.992\), and are given \(x = 10\). Entering these values into the calculator yields \(P(x \leq 10) = 0.0108583424514613 \approx 0.0108\). Note that the calculator also generates other probabilities: \(P(x = 10) = 0.0058396136640344; P(x < 10) = 0.0050187287874269; P(x > 10) = 0.989141657548539; P(x \geq 10) = 0.9949812712125\). The answer: There is 1.08% chance of seeing 10 or fewer reports filed within a 24-hour period.
- What is the probability of 20 or more crime reports being filed in the 24 hour period?
Again, using the online calculator will yield \(P(x \geq 20) = 0.529031908826421\); there is 52.90% chance of seeing 20 or more reports filed within a 24-hour period. Note: The high probability shouldn’t be surprising given that the average is almost 20 (\(19.992\) to be exact).
Example 2
In Berksdale, Nebraska, the average number of fire per day is 0.071. What is the probability of seeing no fires on any given week (7 days)? What is the probability of seeing more than four fires in this week?
For the week we know \(\lambda = 0.071 \times 7 = 0.497\) and \(x=0\). The calculator yields \(P(x=0) = 0.608352983811176 \approx 0.6084\). Similarly, \(P(x > 4) = 0.000167426629467005 \approx 0.0002\)
6.3 Continuous Probability Distributions
In contrast to discrete random variables, continuous random variables
take on a uncountably infinite number of values such that unlike in the case of discrete random variables, where we could calculate the probability of a specific value \((P(X = x))\), we now have to calculate the probability of a value falling in an interval, i.e., \(P(a \leq x \leq b)\), with \(a\) and \(b\) being the lower and upper limits of the interval. Why do we need to do this, you might wonder. Think about it this way: Since the total number of values possible is infinite and probability is calculated with the total number of possible outcomes as the denominator, you pick any value of \(x\) and calculate \(P(X=x) = \dfrac{x}{\infty}\), you will end up with \(0\).
Examples of continuous random variables abound … household incomes, county unemployment rates, scores on a standardized test, individuals’ heights and weights, flight times between New York City and Los Angeles, temperature, humidity, population size of each city, town, village, and township in the USA, undulation rates of gliding snakes, and much more. The probability distributions for continuous random variables then assume a different form, and there are several such distributions one could use. However, we will start with the simplest one of these: The Normal distribution
.
6.3.1 The Normal Distribution
The Normal distribution is the bell-shaped distribution that we have all read about or seen in one way or another. Formally, though, we say that a continuous random variable \(X\) follows a normal distribution if its probability density function is defined as:
\[f(X) =\dfrac{1}{\sigma{\sqrt{2\pi}}}e^{-(X-\mu)^{2}/2\sigma^{2}}\]
where \(\mu\) is the mean of \(X\), \(\sigma\) is the standard deviation of \(X\), \(\pi=3.14159\), and \(e=2.71828\) Two parameters describe a Normal distribution – the mean and the standard deviation. As such, different means and/or standard deviations generate different Normal distributions. The examples below show you three different Normal distributions, each with its own unique combination of \(\mu\) and \(\sigma\).
Similarly, we could have a completely different set of Normal distributions:
In summary, then, there are an infinite number of Normal distributions, each with their own mean and standard deviation. Since it would be impossible for us to know which Normal distribution we are drawing a sample from, we rely on the Standard Normal Distribution
, also known as the z-score Distribution
.
6.3.2 The Standard Normal Distribution
The Standard Normal Distribution is essentially a distribution generated by converting the raw values into their corresponding z-scores. Recall that \(z = \dfrac{x - \mu}{\sigma}\). The beauty of this distribution is that no matter what the raw metric is, income, age, height, population size, percapita gross domestic product, and so on, once we create z-scores, they are guaranteed to have \(\mu = 0\) and \(\sigma = 1\). A positive/negative z-score indicates that the observation lies above/below the mean. The absolute value of the z-score indicates how many standard deviation units above/below the mean an observation falls. In brief, z-scores allow us to identify the relative location
of an observation in a data-set by telling us how many standard deviation units above or below the mean a particular value \(x_{i}\) falls. The beauty of the z-score is that it allows us to compare scores drawn from distributions with dissimilar variability (see below, a redux of the Caribou (ME) versus Boston (MA) example):
Place | Mean | Std.Dev. | December.2016 |
---|---|---|---|
Caribou (ME) | 110 | 30 | 125 |
Boston (MA) | 24 | 5 | 39 |
If you know your geography or have had the good/bad luck to be in Caribou (ME) during the dead of winter, you know that Caribou gets a lot more snowfall than does Boston, just by virtue of Caribou’s latitude and longitude. So how could we really compare what seems to be an apple (Boston, MA) and an orange (Caribou, ME). We cannot unless we convert snowfall into a z-score, using the \(\mu\) and \(\sigma\) for each of the two places. The result is in the last column, yielding z-scores of +0.50 and +3.00 for Caribou and Boston, respectively. What these z-scores tell us is that for Caribou, compared to its usual average and variability, the snowfall it received in December 2016 was only slightly worse than what it gets usually in any given year. On the other hand, Boston had very heavy snowfall relative to its average and the variability around this average.
One of the additional strengths of the Standard Normal Distribution is the fact that we can safely assume:
- about 68% of the data values fall within \(\pm\) 1 standard deviation
- about 95% of the data values fall within \(\pm\) 2 standard deviation
- about 99% of the data values fall within \(\pm\) 3 standard deviation
- z-scores greater/smaller than \(\pm{3}\) are indicative of
outliers
Graphically, we could depict the areas demarcated by the 1, 2 and 3 standard deviations as follows:
Given the symmetry of the distribution, if 68% of the data fall within \(\pm 1\) standard deviation units of the mean, then it must be that 34% of the data fall between the mean \((0)\) and \(-1\) and 34% of the data fall between the mean \((0)\) and \(+1\). Likewise, 47.5% of the data fall between the mean \((0)\) and \(-2\) and \(+2\) standard deviation units, respectively, and 49.5% of the data fall between the mean \((0)\) and \(-3\) and \(+3\) standard deviation units, respectively. Indeed, given any interval of z-scores we can easily find the area between these two z-scores via the Standard Normal Distribution table
.
0 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | |
---|---|---|---|---|---|---|
-3.4 | 0.00034 | 0.00032 | 0.00031 | 0.00030 | 0.00029 | 0.00028 |
-3.3 | 0.00048 | 0.00047 | 0.00045 | 0.00043 | 0.00042 | 0.00040 |
-3.2 | 0.00069 | 0.00066 | 0.00064 | 0.00062 | 0.00060 | 0.00058 |
-3.1 | 0.00097 | 0.00094 | 0.00090 | 0.00087 | 0.00084 | 0.00082 |
-3.0 | 0.00135 | 0.00131 | 0.00126 | 0.00122 | 0.00118 | 0.00114 |
-2.9 | 0.00187 | 0.00181 | 0.00175 | 0.00169 | 0.00164 | 0.00159 |
-2.8 | 0.00256 | 0.00248 | 0.00240 | 0.00233 | 0.00226 | 0.00219 |
-2.7 | 0.00347 | 0.00336 | 0.00326 | 0.00317 | 0.00307 | 0.00298 |
-2.6 | 0.00466 | 0.00453 | 0.00440 | 0.00427 | 0.00415 | 0.00402 |
-2.5 | 0.00621 | 0.00604 | 0.00587 | 0.00570 | 0.00554 | 0.00539 |
-2.4 | 0.00820 | 0.00798 | 0.00776 | 0.00755 | 0.00734 | 0.00714 |
-2.3 | 0.01072 | 0.01044 | 0.01017 | 0.00990 | 0.00964 | 0.00939 |
-2.2 | 0.01390 | 0.01355 | 0.01321 | 0.01287 | 0.01255 | 0.01222 |
-2.1 | 0.01786 | 0.01743 | 0.01700 | 0.01659 | 0.01618 | 0.01578 |
-2.0 | 0.02275 | 0.02222 | 0.02169 | 0.02118 | 0.02068 | 0.02018 |
-1.9 | 0.02872 | 0.02807 | 0.02743 | 0.02680 | 0.02619 | 0.02559 |
-1.8 | 0.03593 | 0.03515 | 0.03438 | 0.03362 | 0.03288 | 0.03216 |
-1.7 | 0.04457 | 0.04363 | 0.04272 | 0.04182 | 0.04093 | 0.04006 |
-1.6 | 0.05480 | 0.05370 | 0.05262 | 0.05155 | 0.05050 | 0.04947 |
-1.5 | 0.06681 | 0.06552 | 0.06426 | 0.06301 | 0.06178 | 0.06057 |
-1.4 | 0.08076 | 0.07927 | 0.07780 | 0.07636 | 0.07493 | 0.07353 |
-1.3 | 0.09680 | 0.09510 | 0.09342 | 0.09176 | 0.09012 | 0.08851 |
-1.2 | 0.11507 | 0.11314 | 0.11123 | 0.10935 | 0.10749 | 0.10565 |
-1.1 | 0.13567 | 0.13350 | 0.13136 | 0.12924 | 0.12714 | 0.12507 |
-1.0 | 0.15866 | 0.15625 | 0.15386 | 0.15151 | 0.14917 | 0.14686 |
-0.9 | 0.18406 | 0.18141 | 0.17879 | 0.17619 | 0.17361 | 0.17106 |
-0.8 | 0.21186 | 0.20897 | 0.20611 | 0.20327 | 0.20045 | 0.19766 |
-0.7 | 0.24196 | 0.23885 | 0.23576 | 0.23270 | 0.22965 | 0.22663 |
-0.6 | 0.27425 | 0.27093 | 0.26763 | 0.26435 | 0.26109 | 0.25785 |
-0.5 | 0.30854 | 0.30503 | 0.30153 | 0.29806 | 0.29460 | 0.29116 |
-0.4 | 0.34458 | 0.34090 | 0.33724 | 0.33360 | 0.32997 | 0.32636 |
-0.3 | 0.38209 | 0.37828 | 0.37448 | 0.37070 | 0.36693 | 0.36317 |
-0.2 | 0.42074 | 0.41683 | 0.41294 | 0.40905 | 0.40517 | 0.40129 |
-0.1 | 0.46017 | 0.45620 | 0.45224 | 0.44828 | 0.44433 | 0.44038 |
0.0 | 0.50000 | 0.49601 | 0.49202 | 0.48803 | 0.48405 | 0.48006 |
The standard normal table shows you the area below a specific z-score
. For example, the area below \(z = -3.40\) is 0.0003369. We would denote this as \(P(z \leq -3.40) = 0.0003369\). The meaning of “area” should now be taken as the proportion of z-scores that fall at or below a specific z-score. Indeed, we might as well speak in terms of probability: The probability of finding a z-score less than or equal to -3.40 is 0.0003369. Now look at the table and figure out how much of the area falls at or below \(z = -1.00\). This turns out to be 0.1586552, highlighted for you in panel (a) of the figure.
Given the symmetry of the curve, this means that the area at or above \(z = +1.00\) must also be 0.1586552. If we add these two areas we see that the combined area below and above \(z = -1.00; z = +1.00\) works out to \(0.1586552 + 0.1586552 = 0.3173\). Since the area under the curve must sum to 1, this means the area between \(z = -1.00; z = +1.00\) must be \(1 - 0.3173 = 0.6827\). That is, 68.27% of z-scores lie in the interval given by \(z = \pm 1\); see the shaded portion in panel (b) below. Similarly, we could calculate areas between \(z = \pm 2.00\) (see panel (c) below), \(z = \pm 3.00\) (see panel (d) below), and any other pair of z-scores.
Take, for example, the area between z-scores of -0.52 and +1.26. To calculate \(P(-0.52 \leq z \leq +1.26)\) we start by calculating the area at or below \(z = -0.52\). \(P(z \leq -0.52)\) turns out to be 0.30153. We then calculate the area at or above \(z = +1.26\). \(P(z \geq +1.26)\) turns out to be 0.10383. So the total area at or beyond these two z-scores is \(0.30153 + 0.10383 = 0.40536\). This isn’t the area we need; we need \(P(-0.52 \leq z \leq +1.26)\) and this can now be calculated as \(1 - P(z \leq -0.52) - P(z \geq +1.26) = 1 - 0.30153 - 0.10383 = 1 - 0.40536 = 0.59464\). The figure below shows this area:
Online tables are available as well; see one excellent applet by David M. Lane here as well as the popular one at surfstat. If using the David M. Lane calculator, be sure to enter values of the desired area Above/Below/Between/Outside. The same table will also allow you to enter an area and find the z-score(s) associated with that area if you select Value from an area
. If using surfstat, select the appropriate graph of the distribution (the area in red shows you what area will be calculated), and then either enter the \(z-score\) to find the area/probability or the area/probability (as a proportion) to get the \(z-score\).
Working with the Standard Normal Distribution
Assume that for all traditional public schools in your state, the mean dropout rate in 2017 was \(\mu = 5\), and the standard deviation was \(\sigma = 1.25\). Given these parameters, if a school had a dropout rate of 7%, what percent of schools in the state did as badly or worse? The answer could be obtained as follows:
\[\begin{eqnarray*} \left.\begin{aligned} z = \dfrac{x - \mu}{\sigma} \\ z = \dfrac{7 - 5}{1.25} = 1.6 \\ \text{What is } P(z \geq 1.6)? \\ P(z \geq 1.6) = 0.05480 \end{aligned}\right. \end{eqnarray*}\]
and thus some 5.48% of schools in the state had dropout rates of 7% or worse (i.e., higher dropout rates).
What if the school had a dropout rate of 9%?
\[\begin{eqnarray*} \left.\begin{aligned} z = \dfrac{x - \mu}{\sigma} \\ z = \dfrac{9 - 5}{1.25} = 3.2 \\ \text{What is } P(z \geq 3.2)? \\ P(z \geq 3.2) = 0.00069 \end{aligned}\right. \end{eqnarray*}\]
i.e., 0.069% of schools in the states had dropout rates of 8% or worse.
We can also ask a different question: What dropout rates demarcate the top 10% and the bottom 10% of schools, respectively? We first find the z-scores that demarcate the top 10% and the bottom 10%, respectively. These turn out to be \(z = +1.28\) and \(z = -1.28\), respectively.
\[\begin{eqnarray*} z = \dfrac{x - \mu}{\sigma} \\ x = \left(z \times \sigma \right) + \mu \\ x = \mu + \left(z \times \sigma \right) \\ x_{top} = 5 + \left(1.28 \times 1.25 \right) = 5 + 1.6 = 6.6 \\ x_{bottom} = 5 + \left(-1.28 \times 1.25 \right) = 5 - 1.6 = 3.4 \end{eqnarray*}\]
i.e., dropout rates of 6.6% and 3.4% separate the top 10% and bottom 10% of schools from all schools in the state. The important point about this particular example is the manner in which we rearranged the formula for a z-score to recover the actual dropout rates associated with the top/bottom 10% of schools.
The z-score is a versatile entity, often used when looking to combine or compare phenomenon measured on different scales. A classic and important example comes from the health ranking dashboards compiled by various organizations. Take the Health Value Dashboard, County Health Rankings, America’s Health Rankings and the Commonwealth Fund, for example. Most of these dashboards are combining information from such disparate measures as the high school graduation rate, percent of adults with access to health care, percent of kids living in safe neighborhoods, child poverty, air pollution levels, access to bike and other alternative transportation options, and so on. How can you really combine disparate elements? Via the z-score, of course, by converting each measure into a z-score and then adding these z-scores to come up with an overall rank for a state or Washington DC. This is not the only benefit of z-scores; they are often used in regression analyses to ease interpretation of the findings, a benefit we will see in action once we start working with regression models.
6.4 Practice Problems
Problem 1
Calculate the following areas:
- \(P(z \leq 1.96)\)
- \(P(z \leq -1.96)\)
- \(P(z \geq -0.87)\)
- \(P(-1.5 \leq z \leq 1.5)\)
- \(P(-1.5 \leq z \leq -1.0)\)
- \(P(z \geq 2.58)\)
- \(P(z \leq -2.58)\)
- \(P(-1.96 \leq z \leq 1.96)\)
Problem 2
Find the \(z-scores\) that leave the respective area above/below/between/beyond
- \(P(z \geq ?) = 0.02\)
- \(P(z \leq ?) = 0.35\)
- \(P(z_{low} \leq ? \leq z_{high} ) = 0.95\) in the middle of the distribution
- \(P(z_{low} \leq ? \leq z_{high} ) = 0.99\) in the middle of the distribution
- \(P(z_{low} \leq ? \leq z_{high} ) = 0.90\) in the middle of the distribution
- \(P(z_{low} \leq ? \leq z_{high} ) = 0.50\) in the middle of the distribution
Problem 3
Babies born in singleton births in the United States have birth weights (in kilograms) that are distributed normally with \(\mu = 3.296; \sigma = 0.560\).
- What is the probability of a baby weighing more than 5 kilograms at birth?
- What is the probability of the baby weighing between 3 and 4 kilograms at birth?
- What fraction of babies will have birth weights more than 1.5 standard deviations from the mean in either direction?
- What fraction of the babies will have birth weights more than 1.5 kilograms from the mean in either direction?
Problem 4
A survey of European mitochondrial DNA variation has found that the most common haplotype (genotype), known as “H”, occurs in 40% of people. If we sampled 400 Europeans, what is the probability that
- At least 180 are haplotype H?
- At least 130 are haplotype H?
- Between 115 and 170 (inclusive) are haplotype H?
Problem 5
NASA excludes anyone under 62 inches in height and anyone over 75 inches in height from being an astronaut pilot. In metric units these cutoffs are 157.5 cm and 190.5 cm, respectively. Assume that heights are normally distributed with means and standard deviations of 177.6 cm and 9.7 cm for 20-29 year-old men, and 163.2 cm and 10.1 cm for 20-29 year-old women. What proportion of men and women in these age groups would be excluded from being NASA astronaut pilots?
Problem 6
The most famous geyser in the world, Old Faithful in Yellowstone National Park, has a mean time between eruptions of 85 minutes. The interval of time between eruptions is normally distributed with a standard deviation of 21.25.
- What is the probability that a randomly selected time interval between eruptions is longer than 95 minutes?
- What is the probability that a randomly selected time interval between eruptions is shorter than 95 minutes?
- What is the probability that a randomly selected time interval between eruptions falls in the interval given by 75 and 95 minutes?
Problem 7
The label on a one gallon jug of milk states that the volume of milk is 128 fluid ounces (fl.oz.) Federal law mandates that the jug must contain no less than the stated volume. The actual amount of milk in the jugs is normally distributed with mean \(\mu = 129\) fl. Oz. and standard deviation \(\sigma = 0.8\) fl. Oz.
- Find the z-score corresponding to a jug containing 128 fl. Oz. of milk.
- What is the probability that a randomly selected jug will contain less than 128 fl. Oz. of milk?
Problem 8
In 2003, the U.S. Bureau of Labor Statistics reported mean annual household expenditure on food and drinks to be $5,700, with a standard deviation of $1,500. Assume these expenditures are normally distributed.
- How much do the 10% of families with the lowest annual household expenditures on food and drinks spend annually on food and drinks?
- What percentage of families spend more then $7,000 annually on food and drinks?
- How much do the families with the top 5% of annual expenditures on food and drinks spend annually?
Problem 9
The mean time that a manager at the U.S. Bureau of Economic Analysis spends on annual performance reviews is 45 minutes, with a standard deviation of 9. Assume annual performance reviews are normally distributed.
- What percentage of annual performance reviews take more than 60 minutes?
- What percentage of annual performance reviews take between 30 and 60 minutes?
- What review times demarcate the top 5% and bottom 5% of annual performance reviews, respectively?
Problem 10
Ohio University allows its faculty and staff to hold university-provided credit cards that can be used for authorized work-related charges. In a given fiscal year the average amount charged by a university employee is $1000, with a standard deviation of $200.
- What percent of employees charge more than $1000?
- What percent of employees charge more than $2000?
- What charge amount puts an employee in the \(99^{th}\) percentile?
- What charge amount puts an employee in the \(9^{th}\) percentile?
Problem 11
The mean number of teletype machines that break down in an hour is 0.625. What is the probability of 2 machines breaking down in an hour? What is the probability of of more than 2 machines breaking down in an hour?
Problem 12
Acme Call Centers’ customer service representatives receive customer service calls at 48 per hour.
- What is the probability of receiving 10 calls in a 15 minute interval?
- What is the probability of receiving at least 10 calls in a 15 minute interval?
- Suppose no calls are currently on hold. If the agent takes 15 minutes to complete the current call, how many calls do you expect to be waiting by that time? What is the probability of no calls waiting by that time?
- If no calls are currently being processed, what is the probability of the agent taking three minutes for a coffee break without being interrupted by a call?
Problem 13
The National Highway Traffic Safety Administration’s (NHTSA) traffic fatality data for the 1994-2014 are available in this Excel file.
- Calculate and report the average annual number of fatalities.
- Based on this annual average, and assuming fatalities are evenly distributed across the 12 months in a year, how many fatalities should we expect per month?
- What is the probability of seeing less than 3500 fatalities in a month?
- What is the probability of seeing at least 3500 fatalities in a month?
Problem 14
The 2014 FIFA World Cup in Brazil saw 32 teams participate, a total of 64 matches played, and 171 goals scored.
- Calculate and report the number of goals scored per match.
- What is the probability of exactly 3 goals being scored in a match?
- What is the probability of 5 or more goals being scored in a match?
- What is the probability of no goal being scored in a match?
- Are goals clumped or dispersed?
Problem 17
Fire Engine Ladder Company 81 in Los Angeles responds to 20 calls per Saturday night in the month of July. Given the drought, they have been exceptionally busy in the preceding months and the crew is tired.
(a) What is the probability that the company will receive no calls on the upcoming Saturday night and catch some well-deserved rest?
(b) What is the probability of having to respond to fewer than 5 calls?
(c) Are calls clumped or dispersed?
Problem 18
The National Oceanic and Atmospheric Administration (NOAA) provides the following data on tropical cyclones per year and type in the Atlantic Basin. Major Hurricanes are those rated 3, 4, or 5 on the Saffir-Simpson Hurricane Scale.
- Given these data, how many major hurricanes should you expect in the next calendar year?
- What is the probability of seeing no major hurricanes next year?
- What is the probability of seeing more than three major hurricanes next year?
- Are major hurricanes clumped or dispersed?
Problem 19
A bookstore in a Texas county was indicted by a grand jury on several counts of selling racist books. The jury was composed of 22 Baptists and 8 other individuals. Some 40% of the county is Baptist. Given their representation in the population, what is the probability of 22 Baptists serving on the grand jury?
Problem 20
The Gallup Poll reports that “employees whose manager involves them in goal setting are four times more likely to be engaged than other employees. Yet this basic expectation only occurs for 30% of employees.” If you selected 20 random employees of state governmental agencies,
- What is the probability of fewer than 3 saying their manager involves them in goal setting?
- What is the probability of exactly 3 saying saying their manager involves them in goal setting?
- What would be the probability of exactly 6 saying saying their manager involves them in goal setting if you drew a random sample of 40 employees?
Problem 21
The MPA program at a major university finds that one in five students withdraws before completing the mandatory introduction to statistics course. Assume that in the next Fall term 20 students are enrolled.
- What is the probability of two students withdrawing?
- What is the probability of more than two students withdrawing?
- What is the probability of no student withdrawing?
- What is the expected number of withdrawals?
Problem 22
Military radar and missile detection systems are supposed to identify an attack and issue a warning. What matters though is their reliability. Assume the most state-of-the-art system has a reliability of 90%, meaning that it correctly detects an attack with a probability of 0.9.
- What is the probability that a single detection system will detect an attack?
- If two detection systems are installed in the same area and operate independently, what is the probability that at least one will detect an attack?
- What if three are installed?
- Would you recommend that multiple systems be installed? Explain why.
Problem 23
On any given day in July of any year there is a 40% chance of a thunderstorm between 10:00 AM and 4:00 PM in the Grand Canyon. A hiker is considering a seven-day hike.
- What is the probability of her not running into any day with thunderstorms?
- What is the probability of her running into at most two days with thunderstorms?
Problem 24
True/False multiple choice tests involving 10 questions, half of which are false, are poor tests because of the guess factor. An unprepared student can easily flip a coin and decide whether to mark True per False per question. If a student pursues this strategy,
- What is the probability that he/she will earn a passing grade (defined as seven or more right answers)?
- What is the probability that he/she will earn an A (defined as nine or more right answers)?