Chapter 5 Probability

Probability theory is one of the hardest fields to master. Most people often get tripped up by what should be simple calculations, thinking that the answer for how often something will happen has to have a fantastically rare probability of occurrence. For example, on the first anniversary of 9/11, the winning pick-three number in the second draw was 9-1-1. This lit up the news of course because it seemed to be an astonishing coincidence. As it turns out, the probability of this happening was very high, exactly $\dfrac{1}{500}$. Several pages have been devoted to the Monty Hall Problem wherein the contestant sees three doors, two with a goat behind them and one with a car, and picks one. The host opens one of the other doors that turns out to have a goat behind it. The contestant is given a choice, should he/she stick with the door they chose or should he/she switch their choice to the unopened door? Probability also comes in handy with party tricks: With 23 people in a room there is a 50-50 chance of two people having the same birthday! In a more serious vein, NASA engineers knew that the probability of a Challenger disaster on flight day was unacceptably high (13%) but most folks never understood this before the shuttle launch was given the green light. How probability works its magic in any of these examples would be elementary if you understood the core principles of probability theory. However, remember, we are not looking to become probability theorists. Not at all. Our task is simpler, to grasp the essence of some probability mechanics so that we can build a bridge between the samples we gather and the population our samples should represent.

5.1 Basic Concepts and Terminology of Probability Theory

In statistics, we think of every study as an experiment. Not in the sense of a laboratory experiment but in terms of a real-world experiment that nature is carrying out without revealing to us all the truths. For example, a district tries to boost literacy by spending a lot of money on programs that claim to have every third grader proficient in English language arts. The district has just carried out an experiment whose outcomes are unknown (i.e., will the program work, worsen the situation, or have no impact?). A prison implements a prisoner rehabilitation program with wrap-around skills; will it have the desired effect? Patients battling Parkinson’s disease are given a new drug designed to control tremors; will it work as designed? Experiments surround us, every day of our lives and with consequences big and small.

All experiments have outcomes that we can identify a priori (before the experiment is conducted). For example, a drug either makes the patient better (outcome 1), worse (outcome 2), or has no impact (outcome 3). If we could list all outcomes likely in an experiment we would have the sample space denoted by the symbol $S$. Let us run a few simple experiments.

I toss a coin knowing it can come up Heads or Tails. Only two outcomes are likely in this experiment and so the sample space can be written as $S = \{Heads \text{, } Tails\} = \{H,T\}$
I roll a six-sided dice knowing it can come up with the face showing one of the following numbers: $S = \{1,2,3,4,5,6\}$.
A state trooper pulls over a driver she thinks is driving while intoxicated and administers the field sobriety test. The driver is either sober or above the legal limit. That is, $S=\{Sober \text{, } Drunk\}$
A politician runs for elected office. She either wins or loses; $S=\{Win \text{, } Lose\}$⁸

Each outcome of an experiment is a sample point, with all likely outcomes together making up the sample space. One or more sample points of the sample space could make up an event. For example, we could define event $A$ as seeing an odd-number show up on the dice when it is rolled once. That is, $A=\{1, 3, 5\}$. Or we could define an event $B$ as having a perfect square show up on the face of the dice, i.e., $B=\{1, 4\}$.

TABLE 5.1: Some Experiments and their Outcomes
Experiment	Outcomes
Coin Toss	(Head, Tail)
Sales Call	(Sale, No Sale)
Roll a Die	({1,2,3,4,5,6})
Product Test	(Defective Not defective)
Health Campaign	(Behavior modified Not)
Green Eyes	(Yes, No)

Each outcome occurs with some probability. For example, we all recognize that if I have a fair coin that I toss, it could either land Heads or Tails. Since it is a fair coin, not biased towards Heads or Tails, only one of two likely outcomes could occur in a single toss. Therefore, there is a 50:50 chance of seeing a Heads or a Tails in a coin toss. However, you could flip a coin ten times and see more (or all) of Heads or Tails. How does that make for a 50:50 chance then? It does not unless you think in a very specific way. In particular, in statistics, we define probability as a numerical measure of the likelihood that an event will occur. It is, essentially, the proportion of times the event would occur if we repeated the random trial under identical conditions an infinite number of times. See the example below where I toss a coin 10 times, 100 times, 1,000 times, and then 10,000,000 (ten million) times, counting in each case the proportions of Heads and Tails, respectively.

FIGURE 5.1: Simulated Coin Flips

It is only in the case of 10 million coin flips that I approach anything close to a 50:50 split between the proportion of Heads and the proportion of Tails. Even here the split actually is 50.02% Heads and 49.97% Tails. What is the point of this demonstration? That probability of any outcome only makes sense in the sense of an infinite number of identical trials/experiments. In the coming chapters I will keep pointing out how to see your single sample “as if” you had tossed it back into the population and then drawn a random sample of exactly the same size, done your analysis, tossed the sample back, drawn another random sample of exactly the same size, and on and on until all unique samples of identical size had been drawn and analyzed.

Before we move on, though, notice that we can define probability as a proportion: the ratio of the number of times an outcome occurs to the sample space. By definition, the probability of any event must lie between 0 and 1, i.e., $0 \le P(Event) \le 1$. If an event has a probability of 1 then it always occurs. Similarly if an event has a probability of 0 it never occurs. Many outcomes tend to have well-defined probabilities. For instance, if I toss a fair coin once, I know a priori that $P(Heads)=P(Tails)=\dfrac{1}{2}=0.50$. Similarly, if it is a very tight political contest between three candidates and no polls are available to forecast the outcome, I might as well assume that $P(\text{Candidate A wins})=P(\text{Candidate B wins})==P(\text{Candidate C wins})$ so that the probability of Candidate A or B or C winning is $\dfrac{1}{3}=0.33$.

5.2 Counting Rules with Permutations and Combinations

Calculating the number of likely outcomes is easy when you have a simple experiment such as flipping a coin once. But what if you were flipping 10 coins 10 times each? Counting rules simplify these calculations a great deal.

Consider tossing one coin twice, and knowing that the possible outcomes are four in total: $S=\{H,H\},\{H,T\},\{T,H\},\{T,T\}$. Generally, in a multi-step experiment with $k$ sequential steps with $n_{1}$ outcomes in Step 1, $n_{2}$ outcomes in Step 2, $n_{3}$ outcomes in Step 3, $\ldots$ and $n_{k}$ outcomes in Step $k$, the total number of experimental outcomes is given by:

\[(n_{1})(n_{2})(n_{3})\cdots(n_{k})\]

For example, tossing one coin twice yields $(n_{1})(n_{2})=(2)(2)=4$ outcomes while tossing one coin six times yields $(n_{1})(n_{2})(n_{3})(n_{4})(n_{5})(n_{6})=(2)(2)(2)(2)(2)(2)=64$ outcomes. In all of these cases of coin flips we have said nothing about whether the order matters at all; we do not care whether Heads or Tails shows up first.

Sometimes you may have to select $n$ objects from a larger pool of $N$ objects. In these situations, we can calculate the number of experimental outcomes possible via

\[ C^{N}_{n} = \left(^{N}_{n}\right) = \frac{N!}{n!(N-n)!} \]

You may not remember this from the good old high school days but $!$ stands for factorial such that $N! = N(N-1)(N-2)\cdots(2)(1)$ and $n! = n(n-1)(n-2)\cdots(2)(1)$. Note that $0! = 1$, $3! = 3 \times 2 \times 1 = 6$, and $5! = 5 \times 4 \times 3 \times 2 \times 1 = 120$. How is this useful?

For instance, I work in the quality control department of a manufacturing plant that makes noise cancelling headphones. My job is to randomly select two headphones out of a batch of five and check for defects. In how many ways can I select two headphones from a batch of five headphones? Let us use the formula:

\[ C^{N}_{n} = \left(^{N}_{n}\right) = \frac{N!}{n!(N-n)!} \]

\[C^{5}_{2} == \frac{5!}{2!(5 - 2)!} = \frac{5 \times 4 \times \cancel{3} \times \cancel{2} \times \cancel{1}}{(2 \times 1) (\cancel{3} \times \cancel{2} \times \cancel{1})} = \frac{5 \times 4}{2 \times 1} = \frac{20}{2} = 10\]

You could label the headphones A, B, C, D, and E, and then count the number of combinations: AB, AC, AD, AE, BC, BD, BE, CD, CE, and DE until no unique sample is left; this is the sample space. This would be a very tedious and potentially error-prone method of manually calculating the result if either $n$ or $N$ were large; the formula helps ease our task a great deal. Before we proceed any farther, note that we draw no distinction here between whether A is the first headphone or the second or the third. That is, AB counts the same as BA and hence BA does not show up as an outcome. These situations, where the order of the outcomes is of no interest are called combinations but if the order matters, then we have permutations. Here is an example of permutations in action. Assume now that we do want to draw distinction between cases such as AB versus BA, AC versus CA, and so on. In how many ways could we gather two headphones out of five? The formula is slightly different here:

\[ P_{n}^{N} = n!\left(^{N}_{n}\right) = \frac{N!}{(N-n)!} \]

\[ P^{5}_{2} = \frac{5!}{(5-2)!} = \frac{5!}{3!} = \frac{(5)(4)(3)(2)(1)}{(3)(2)(1)} = \frac{(5)(4)\cancel{(3)}\cancel{(2)}\cancel{(1)}}{\cancel{(3)}\cancel{(2)}\cancel{(1)}} = 20 \]

Spelling these out would give us AB, BA, AC, CA, AD, DA, AE, EA, BC, CB, BD, DB, BE, EB, CD, DC, CE, EC, DE, and ED; this is the Sample Space.

5.3 Assigning Probabilities to Events

Thus far we’ve seen how to use counting rules to establish the Sample Space. Now let us think about the probabilities associated with each outcome. We can assign probabilities to outcomes so long as we use two rules:

For an event $E_{i}$, $0\leq P(E_{i})\leq 1$
Given $n$ outcomes, $P(E_{1})+P(E_{2})+P(E_{3})+\cdots + P(E_{n}) = 1$

For example, in tossing a fair coin, $P(H) = 0.5; P(T) = 0.5$. Therefore $P(H)+P(T)=1$. Likewise, in tossing a fair dice, $P(1) = \frac{1}{6}; P(2) = \frac{1}{6}; \cdots P(6) = \frac{1}{6}$. Therefore, $P(1)+P(2)+\cdots+P(6)=1$. This method of assigning probabilities is known as the classical method, i.e., all outcomes are equally likely.

Assume a clerk in the student health center at a university tracks how many patients are on the waiting list at 9 AM on 20 successive days. These data are given below:

TABLE 5.2: Example of the Relative Frequency Method
No. Waiting	No. of Days	Proportion
0	2	2/20 = 0.10
1	5	5/20 = 0.25
2	6	6/20 = 0.30
3	4	4/20 = 0.20
4	3	3/20 = 0.15
Total	20	20/20 = 1.00

It seems, based on his observations, that on any given day he should expect 2 patients on the waiting list since this outcome has the highest relative frequency of $0.30$. I want you to note how the data and calculating frequencies and relative frequencies has lead to this conclusion; they have enabled us to answer what would be most likely if we picked a day at random.

Here is another example, where a company is looking to build a bridge over an expanse of water and gathering data on how long the two phases – design versus construction – have taken for similar sized bridges built in the recent past. These data are shown below:

TABLE 5.3: Bridge-building and Relative Frequencies
Design	Construction	Time/Phase	Total Time	Frequency	P(Time)
2	6	(2,6)	8	6	6/40 = 0.15
2	7	(2,7)	9	6	6/40 = 0.15
2	8	(2,8)	10	2	2/40 = 0.05
3	6	(3,6)	9	4	4/40 = 0.10
3	7	(3,7)	10	8	8/40 = 0.20
3	8	(3,8)	11	2	2/40 = 0.05
4	6	(4,6)	10	2	2/40 = 0.05
4	7	(4,7)	11	4	4/40 = 0.10
4	8	(4,8)	12	6	6/40 = 0.15

Note that most projects seem to have taken three months for design and seven months for construction; hence the company feels this is the most likely outcome for its own project.

In some instances, we may have no hard data to go by, only our own subjective perceptions (right or wrong) of what might happen. Here, for instance, we have a couple bidding on a house. They each have their own estimates of whether their bid will be accepted or denied.

TABLE 5.4: Subjective Probabilities
Person	P(Acceptance)	P(Rejection)	Sum
Pat	0.8	0.2	1
Chris	0.6	0.4	1

Subjective probabilities are used more often than not in international security. Today we might think of it in terms of a drone strike on a potentially high value target based on hazy intelligence but a classic example comes from the infamous Christmas bombing of Hanoi. Here is an extract from Politico though you should read the original piece in its entirety to appreciate the value of what has come to be known as “The Madman Theory.”

In the summer of 1969, Nixon knew he needed to do something bold about Vietnam. He’d been elected with a promise to end the war, but months later, the conflict continued to consume his presidency, and peace was nowhere in sight. When the Paris peace talks had collapsed earlier that summer, the North Vietnamese had declared that they’d sit silently “until the chairs rot.” Nixon and Kissinger sought to restart the negotiations by pushing the Soviets to lean on the North Vietnamese. And so, Nixon turned to what came to be known as the “Madman Theory” —- a game-theory based approach he had witnessed as Dwight Eisenhower’s vice president that was meant to raise uncertainty in the Soviet mind about whether Nixon would launch his nuclear weapons if provoked. The White House needed to convince the Soviets that Nixon would resort to anything — including a nuclear attack — to get peace in Vietnam. As Defense Secretary Melvin Laird said later, “He never [publicly] used the term ‘madman,’ but he wanted adversaries to have the feeling that you could never put your finger on what he might do next. Nixon got this from Ike, who always felt this way.”

Where game-theory comes into the story is in the indisputable fact that reasonable individuals disagreed whether the North Vietnamese would return to the negotiating table and sue for peace after their sudden withdrawal from the talks on December 13, 1972. Some felt that the Russians would bring the North Vietnamese back to the table but others, including Nixon and Kissinger, thought otherwise. Consequently, on December 18, Operation Linebacker II began, lasting for two weeks and involving 741 B-52 sorties that rained 20,000 tons of munitions on Hanoi and Haiphong, knocking out almost all of the electricity supply grid and killing 1,600 civilians. Regardless of who held more accurate subjective beliefs about the North Vietnamese’s intentions, by December 29 they had agreed to resume the talks and the Vietnam War ended soon thereafter.

Thus far we have covered a good bit of ground, putting into place several rules of probability theory, but not really applied them to concrete problems. Let us do so, starting with a simple example.

5.3.1 Example 1: Venture Capital Funding

Of 2,374 venture capital awards disbursed nationwide in 2016, 1,434 went to CA, 390 to MA, 217 to NY, and 112 to CO. Some 22% went to companies in early stages and 55% to expanding companies.

What is P(Company from CA)? $P(California)=\frac{1434}{2374}=0.60$

What is P(Company from other state)? Number that went to other than these 4 states $=2374 - \{1434+390+217+112)=221$. So, P(Other states) $=\frac{221}{2374} = 0.09$.

What is P(Company not in early stage)? P(Not in early stage) $=1-0.22 = 0.78$

P(How many MA companies were in early stages, assuming early stage companies were evenly distributed across states)? Approximate number of MA companies in early stages $=(0.22)(390) \approx 86$

If the total amount was $32.4 million, how much went to CO? Assume fund amounts were distributed in proportion to the relative distribution of awards. Then, $=\frac{112}{2374}(\$32.4 \text{ billion}) = \$1.53 \text{ billion}$

5.3.2 Example 2: Powerball

Powerball played twice per week in 23 states, VI, and DC. Player must buy $1 ticket and pick five numbers from $1 \ldots 53$ and one Powerball number from $1 \ldots 42$. Lottery officials draw (i) 5 White balls out of drum with 53 White balls, and (ii) 1 Red ball from drum with 42 Red balls. The winner must match 5 numbers on White balls (any order) and the number on the Red Powerball. In Augusts 2001, 4 winners shared $295 million by matching $8, 17, 22, 42, 47$ plus Powerball number $21$. Minor prizes of $100,000 given if 5 White ball numbers matched. In how many ways can five numbers be selected?

\[C^{53}_{5}=\dfrac{53!}{5!(53-5)!}=\dfrac{53!}{5!(48)!} = \dfrac{(53)(52)(51)(50)(49)}{(5)(4)(3)(2)(1)}=2,869,685\]

What is P(winning $100,000)? $=\dfrac{1}{2,869,685}$

What are the odds of picking the Red Powerball?

\[C^{42}_{1}=\dfrac{42!}{1!(42-1)!}=\dfrac{42!}{41!} = 42; \therefore \mbox {Odds of picking Red ball} = \dfrac{1}{42}\]

What are the odds of winning the Powerball jackpot? $P(A) \mbox{and} P(B) = P(A) \times P(B) = \left(\dfrac{1}{2869685}\right)\times\left(\dfrac{1}{42}\right)=\dfrac{1}{120,526,770}$

5.3.3 Example 3: Rolling Two Dice

Two dice are rolled and we are interested in the sum of face values showing on the 2 dice. Possible outcomes are (1,1), (1,2), (1,3), (1,4), (1,5), (1,6) … (6,1), (6,2), (6,3), (6,4), (6,5), (6,6). In other words, $C^{6}_{1} \times C^{6}_{1} =(6)(6)=36$ outcomes are likely.

TABLE 5.5: Sum of Two Dice

	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

Note that the first row with columns 1, 2, 3, 4, 5, and 6 are the numbers that show up on a roll of Dice 1 and the first column with rows 1, 2, 3, 4, 5, and 6 are the numbers that show up on a roll of Dice 2.

What is P(value of 7)? $P(7)=\dfrac{\{(1,6), (6,1), (2,5), (5,2), (3,4), (4,3)\}}{36} = \dfrac{6}{36}=\dfrac{1}{6}$

What is P(value $\geq 9$)? $P(\geq 9)=\dfrac{10}{36}=\dfrac{5}{18}$

Will the sum of the two dice show even values more often than odd values? No because P(Odd) = P(Even) = $\dfrac{18}{36} = \dfrac{1}{2}$

How did you assign probabilities? Classical, because each outcome has an identical probability of occurring.

5.3.4 Example 4: Fortune 500 Companies

The table below shows the registration of Fortune 500 companies in some states.

TABLE 5.6: Headquarters of Fortune 500 Companies
State Headquartered	No. of Fortune 500 Cos.	Proportion
New York	56	0.112
California	53	0.106
Texas	43	0.086
Illinois	37	0.074
Ohio	28	0.056
Pennsylvania	28	0.056

If I pick a company at random,

What is P(NY)? $P(NY)=\frac{56}{500}=0.112$

What is P(TX)? $P(TX)=\frac{43}{500}=0.086$

What is P(in any of these six states)? $=\frac{(56+53+43+37+28+28)}{500}=\frac{245}{500}=0.49$

5.4 The Complement of an Event

Given an event A, its complement is defined as the event consisting of all sample points that do not belong to (i.e., are not in) event A, and is denoted as $A^c$.

\[\begin{eqnarray*} P(A) + P(A^c) = 1 \\ \therefore P(A) = 1 - P(A^c) \\ \therefore P(A^c) = 1 - P(A) \end{eqnarray*}\]

Toss a coin once: $P(H) = 0.5; P(H^c) = 1- P(H) = 1-0.5 = 0.5$

Roll two dice: $P(\text{value } \geq 9) = \dfrac{5}{18}$

$P(\text{value } < 9) = 1 - P(\text{value } \geq 9) = 1 - \dfrac{5}{18} = \dfrac{13}{18}$

Why is this notion of the complement of an event useful? Because given that the probabilities of the event happening and not happening must sum to unity i.e., $(1)$, given one probability we can always calculate the other.

5.5 Mutually Exclusive Events

Two events are mutually exclusive if both cannot occur simultaneously. That is, if event A occurs then event B cannot occur, and vice-versa. Say I toss a coin once. It can only come up Heads or Tails. Let, for example,

\[\begin{eqnarray*} P(A) = Heads; P(B) = Tails \\ P(A \text{ and } B) = 0 \\ \ldots \text{ because there is no overlap ... A and B are mutually exclusive} \end{eqnarray*}\]

Similarly, say I roll a dice once. Let, for example,

\[\begin{eqnarray*} P(A) = \{1, 3, 5\}; P(B) = \{2, 4, 6\} \\ P(A \text{ and } B) = 0 \\ \ldots \text{ because there is no overlap ... A and B are mutually exclusive} \end{eqnarray*}\]

What if I again roll a dice once and this time

\[\begin{eqnarray*} P(A) = \{2, 4, 6\}; P(B) = \{1, 4\} \\ P(A \text{ and } B) \neq 0 \\ \ldots \text{ because there is an overlap ... A and B are not mutually exclusive} \end{eqnarray*}\]

Knowing that two events are mutually exclusive (or not) is critical in order to calculate the probability of some outcome that involves more than one event. You can see this rule in action vis-a-vis the addition rules for mutually exclusive versus non-mutually exclusive events.

5.6 The Addition Rule for Mutually Exclusive Events

For two mutually exclusive events, A and B, the probability that either A or B occurs is given by $P(A \text{ or } B) = P(A) + P(B)$. The rule also extends to more than 2 events so long as they are all mutually exclusive events.

A dice is rolled once. What is $P(3 \text{ or more})$?

\[\begin{eqnarray*} = P(3) + P(4) + P(5) + P(6) \\ = \dfrac{1}{6} + \dfrac{1}{6} + \dfrac{1}{6} + \dfrac{1}{6} \\ = \dfrac{4}{6} \\ = \dfrac{2}{3} \end{eqnarray*}\]

What is $P(\text{not rolling 3 or more})$? … $1 - P(3 \text{ or more}) = 1 - \dfrac{2}{3} = \dfrac{1}{3}$

5.7 Addition Rule for Non-Mutually Exclusive Events

For non-mutually exclusive events we calculate the probability that event A or B occurs as $P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$

Assume on a typical day in a plant, of 50 workers 5 complete the work late, 6 assemble a defective product, and 2 both complete the work late and produce a defective product

Let L = Late completion. Then, $P(L)=\dfrac{5}{50} =0.10$

Let D = Defective product. Then, $P(D)=\dfrac{6}{50} =0.12$

$P(L \text{ and } D) = \dfrac{2}{50}=0.04$

What is the probability that a randomly selected worker either is late or produces a defective product? $P(L \text{ or } D)=P(L)+P(D)-P(L \text{ and } D) = 0.10 + 0.12 - 0.04 = 0.18$

5.8 Independent Events and the Multiplication Rule

We also take into consideration whether two or more events are independent or dependent, i.e., whether the probability of one event occurring is not at all influenced by whether the other event(s) has/have occurred or not or if it is conditioned by the other event(s) occurrence/non-occurrence. Specifically, two events, A and B, areindependent events if P(A) is not influenced by whether event B has occurred or not (and vice-versa). For instance, (1) rolling a 4, and rolling a 1 on a second roll of the same dice, and (2) picking the Ace of Spades from a fair deck of 52 cards, replacing it, and then choosing the Ace of Spades again are both independent events; what happens in the first step does not influence what happens in the second step.

If two events A and B are independent, then the probability that both A and B occur is given by $P(A \text{ and } B) = P(A) \times P(B)$. For example,

for (1) above $P(A \text{ and } B) = \dfrac{1}{6} \times \dfrac{1}{6} = \dfrac{1}{36}$

for (2) above $P(A \text{ and } B) = \dfrac{1}{52} \times \dfrac{1}{52} = \dfrac{1}{2704}$

Here is a substantive example: Assume we are told that for a randomly chosen adult, $P(\text{smoking})=0.17$ and $P(\text{high blood pressure}) = 0.22$. If smoking and high blood pressure are independent, what is $P(\text{smoking and high blood pressure})$? This would be $P(smoking \text{ and } blood \text{ } pressure) = P(A) \times P(B) = 0.17 \times 0.22 = 0.037$.

5.9 Decision Trees

Trees are handy ways to depict sequential events and their probabilities and often used to understand sequential outcomes. Game theorists, for example, put decision trees to good use. In the example below we see the case of a tree built with the question of two-child families. Specifically, the tree outlines for us the probabilities of a family having both kids of a given sex versus having one kid of each sex.

What is $P(1 \text{ Boy and } 1 \text{ Girl})$? $=(0.249856 + 0.249856) = 0.499712$

What is $P(\text{at least } 1 \text{ Girl})$? $=(1 - 0.262144) = 0.737856$. This is also $1 - P( \text{both are Boys})$

What is $P(\text{at least } 1 \text{ Boy})$? $=(1 - 0.238144) = 0.761856$. This is also $1 - P(\text{both are Girls})$

What is $P(\text{both are of the same sex})$? $=(0.262144 + 0.2381444) = 0.5002884$

5.10 Conditional Probability

The conditional probability of an event is the probability of that event occurring given that a condition is met (i.e., some other event is known to have occurred). Conditional probabilities are denoted as $P(A | B)$ (i.e., the probability of A given that B has occurred) and $P(A | B) = \dfrac{P(A \text{ and } B)}{P(B)}$. Similarly, we have $P(B | A) = \dfrac{P(A \text{ and } B)}{P(A)}$, (i.e., the probability of B given that A has occurred).

For example, let B be an event of getting a perfect square when a dice is rolled. Let A be the event that the number on the dice is an odd number. What is $P(B|A)$? The Sample Space is $S=\{1,2,3,4,5,6\}$; $A =\{1,3,5\}$; $B=\{1,4\}$. Then, $P(A \text{ and } B)=\dfrac{1}{6}; P(A)=\dfrac{1}{2}; P(B)=\dfrac{1}{3}$ and hence $P(B|A)=\dfrac{P(A \text{ and } B)}{P(A)}=\dfrac{\dfrac{1}{6}}{\dfrac{1}{2}}=\dfrac{1}{6} \times \dfrac{2}{1}=\dfrac{2}{6}=\dfrac{1}{3}$

For dependent events, the multiplication rule that determines the probability that both A B occur becomes:

\[\begin{array}{l} P(A \text{ and } B) = P(A) \times P(B | A), \text{ and} \\ P(A \text{ and } B) = P(B) \times P(A | B) \end{array}\]

Here is a specific example that pulls together various elements of basic probability theory we have covered thus far, and especially so the concept of conditional probability.

5.10.1 The Gender-bias Example

Say we have a city police department and there have been claims that men and women are not promoted at the same rate. Let us assume that when candidates are eligible for promotion they are equally qualified, regardless of sex. You are hired to investigate whether this claim of a gender bias has any merit to it. You ask for data on all officers currently serving in the city and for each officer whether they were promoted or not. The data are shown below.

TABLE 5.7: An Example of Conditional Probability
Action	Men	Women	Total
Promoted	288	36	324
Denied Promotion	672	204	876
Total	960	240	1200

If Men and Women had the same probability of being promoted each group should have $P(A)=\dfrac{324}{1200}= 0.27$. This means we should have $P(A \text{ and } M) = P(A \text{ and } W) = 0.27$. We can now ask: What is the probability that an officer is promoted given that the officer is a man? $P(A|M) = \dfrac{P(A \text{ and } M)}{P(M)} = \dfrac{288/1200}{960/1200} = \dfrac{288}{960} = 0.30$. Similarly, we might ask: What is the probability of being denied a promotion given that the officer is a woman? $P(A^{c}|W) = \dfrac{P(A^{c} \text{ and } W)}{P(W)} = \dfrac{204/1200}{240/1200} = \dfrac{204}{240} = 0.85$

TABLE 5.8: Probabilities for the preceding table
Action	Men	Women	Total
Promoted	P(.) = 0.24	P(.) = 0.03	P(.) = 0.27
Denied Promotion	P(.) = 0.56	P(.) = 0.17	P(.) = 0.73
Total	P(.) = 0.80	P(.) = 0.20	P(.) = 1.00

What would you conclude and report back to the city? Unfortunately, that female officers do not seem to be promoted at the same rate as are male officers. Notice how the notion of conditional probability and independent events allowed you to tackle this question!

5.11 Dependent Events

Many events are not independent of one another; the odds of event A change if event B has occurred (and vice versa). Take the Nasonia’s fascinating behavior, for example. The jewel wasp Nasonia vitripennis is a parasite laying its eggs on the pupae of flies. The larval Nasonia hatch inside the pupal case, feed on the live host, and grow until they emerge as adults from the now dead, emaciated host. Emerging males and females, possibly brother and sister, mate on the spot.

Nasonia females have a remarkable ability to manipulate the sex of the eggs that they lay; if they fertilize the egg with stored sperm the offspring will be a female. When a female finds a fresh host (i.e., not parasitized), she lays mainly female eggs and a few sons needed to fertilize all her daughters. If the female finds the host to be parasitized she produces a higher proportion of sons. Thus the state of the host (parasitized or not) and the sex of an egg are dependent events.

Let the probability a host already has eggs be = 0.20. If it is a fresh host, the female lays a male egg with a probability of 0.05 and a female egg with a probability of 0.95. If the host already has eggs the female lays a male egg with a probability of 0.90 and a female egg with a probability of 0.10. What is the probability that an egg, chosen at random, is male?

What is $P(\text{egg is male})$? … $0.18 + 0.04 = 0.22$ What is $P(\text{egg is female})$? … $0.02 + 0.76 = 0.78$

The Law of Total Probability stipulates that the total probability of an event $A$ is given by $P(A) = \left[P(A|B) \times P(B)\right] + \left[ P(A|B^c) \times P(B^c)\right]$

The Multiplication Rule: The probability of both of two events occurring is given by $P(A \text{ and } B) = P(A) \times P(B | A)$. If the two events are independent then we know that $P(A \text{ and } B) = P(A) \times P(B)$. So what is $P(\text{egg is male})$? Well, we don’t know if the host is parasitized so …

$P(\text{egg is male}) = P(\text{host parasitized}) \times P(\text{egg is male } | \text{ host parasitized}) + P(\text{host not parasitized}) \times P(\text{egg is male } | \text{ host is not parasitized})$

$P(\text{egg is male}) = (0.20 \times 0.90) + (0.80 \times 0.05) = 0.22$

What is $P(\text{egg is female})$? We don’t know if the host is parasitized so …

$P(\text{egg is female}) = P(\text{host parasitized}) \times P(\text{egg is female } | \text{ host parasitized}) + P(\text{host not parasitized}) \times P(\text{egg is female } | \text{ host is not parasitized})$

$P(\text{egg is female}) = (0.20 \times 0.10) + (0.80 \times 0.95) = 0.78$

5.11.1 The Monty Hall Problem

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind each of the other doors, a goat. You pick a door, say No.1, and the host, who knows what’s behind the doors, opens another door, say No.3, which has a goat. He then says to you, “Do you want to pick door No.2?” Well, what should you do? Stay with your original pick or switch? The winning strategy is to always switch to the still closed door you didn’t originally pick. Why? How?

TABLE 5.9: The Monty Hall Problem
Scenarios	Door 1	Door 2	Door 3	Host’s Action
Scenario 1	Goat	Car	Goat	Opens Door 3
Scenario 2	Car	Goat	Goat	Opens Door 2
Scenario 3	Goat	Goat	Car	Opens Door 2

Say you pick Door 1 and you know there are three possibilities here. Either the car is behind Door 1, it is behind Door 2, or it is behind Door 3. So the probability the car is behind Door 1 is $\dfrac{1}{3}$, behind Door 2 is $\dfrac{1}{3}$ and behind Door 3 is $\dfrac{1}{3}$. You can also think of this as there is a $\dfrac{1}{3}$ probability the prize is behind Door 1 and hence there is a $\dfrac{2}{3}$ probability that the prize is behind Door 2 or 3. That is, $P(\text{Car behind Door 2 or Door 3}) = \dfrac{2}{3}$.

Once the host opens up one of the doors you did not pick, say Door 3, and this door will always have a goat, $P(\text{Car behind Door 2 or Door 3}) = \dfrac{2}{3}$ is still true BUT now you know the car is not behind Door 3 and hence $P(\text{Car behind Door 2 or Door 3}) = P(\text{Car behind Door 2}) = \dfrac{2}{3}$; you better switch!

This puzzle has spawned a book and several pages of explanations. Try out this simulation here to convince yourself switching is the winning strategy.

5.12 Bayes’ Theorem

Having established some basic rules of probability theory, we can now turn to one of the most exciting developments that occurred a long time ago but is being seen in a new light only in recent decades: Bayesian statistics. Who knew an eighteenth century Presbyterian minister could wreak such positive havoc! Indeed, his early insights have applications in such diverse areas as medical testing, ecology, Google’s self-driving cars, spam filters that protect your email, and even saving a fisherman’s life?. Let us appreciate Bayes’ Theorem with reference to an often used problem.

Say I present you with the following information: 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A 40-year old woman had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

TABLE 5.10: Breast Cancer and Bayes Theorem
Test	Cancer	No
Positive	0.800	0.096
Negative	?	?

TABLE 5.11: Breast Cancer and Bayes Theorem (Part 2)
Test Result	Will have cancer	Will not have cancer
Positive	0.800	0.096
Negative	0.200	0.964
Total	1.000	1.000

The question we are tasked with is this: If a woman has a positive test result, what is the probability that she has breast cancer? That is, what is $P(\text{Breast Cancer } | +\text{Mammography)}$, i.e., $P(B | A)$? We might struggle if it weren’t for the clean logic that Bayes encapsulated into an elegant mathematical statement. For now, let us break the problem down based on the information given to us.

Start by assuming a certain population size of 40-year old women who are tested, i.e., $N = 100,000$. We know that 1% of this population will have breast cancer, which amounts to $1,000$ women. Some 80% of women who have breast cancer will test positive for breast cancer, i.e, $0.80 \times 1000 = 800$, which means $200$ women who have breast cancer will test negative.

We are also told that 9.6% of women without breast cancer will also get a positive test result. This amounts to $0.096 \times 99000 = 9,504$. This means that $86,696$ women will have a negative test and have no breast cancer.

Entering these values into the table we can calculate the probability that a woman with a positive test result actually has breast cancer to be $\dfrac{800}{10304} = 0.0776$ or 7.76%. What about the probability that the woman has a negative test result and yet has breast cancer? This would be $\dfrac{200}{89696} = 0.0022$ or 0.22%. The table below is populated with the estimates.

TABLE 5.12: Breast Cancer and Bayes Theorem (Part 3)
Test Result	Will have cancer	Will not have cancer	Row Total	Probability
Positive	800	9504	10304	800/10304 = 0.0776
Negative	200	89496	89696	200/89696 = 0.0022
Total	1000	99000	100000

One could arrive at the same conclusion via applying Bayes’ Theorem, as shown below.

\[\begin{eqnarray*} P(B | A) = \dfrac{P(A | B) \times P(B)}{\left[ P(A | B) \times P(B) \right] + \left[ P(A | B^c) \times P(B^c) \right]} \\ = \dfrac{0.800 \times 0.01}{\left[ 0.800 \times 0.01 \right] + \left[0.096 \times 0.990 \right]} = \dfrac{0.008}{\left[0.008 + 0.09504 \right]} = \dfrac{0.008}{0.10304} = 0.0776 \end{eqnarray*}\]

$P(B|A) = \dfrac{P(A|B) \times P(B)}{P(A)}$

What is $P(A|B)$ … the probability that a woman has Breast Cancer and gets a positive mammography? $= 0.80$?

What is $P(B)$? … this is 0.01

What is $P(A)$? … This is the probability of getting a positive mammography, which can happen as follows:

when she has Breast Cancer … $(0.80 \times 0.01)$, or
when she doesn’t have Breast Cancer … $(0.096 \times 0.99)$

\[\begin{eqnarray*} P(A) = (0.80 \times 0.01) + (0.096 \times 0.99) = 0.10304 \\ \therefore P(B|A) = \dfrac{P(A|B) \times P(B)}{P(A)} \\ = \dfrac{0.80 \times 0.01}{0.10304} = \dfrac{0.008}{0.10304} = 0.07763975 \end{eqnarray*}\]

5.12.1 Another Example

Suppose a patient exhibits symptoms that make her physician concerned that she may have a particular disease. The disease is relatively rare in this population, with a prevalence of 0.2% (meaning it affects 2 out of every 1,000 persons). The physician recommends a screening test that costs $250 and requires a blood sample. Before agreeing to the screening test, the patient wants to know what will be learned from the test, specifically she wants to know the probability of disease, given a positive test result, i.e., P(Disease | Screen Positive). The physician reports that the screening test is widely used and has a reported sensitivity of 85%. In addition, the test comes back positive 8% of the time and negative 92% of the time. Source

TABLE 5.13: Testing Positive and Bayes Theorem
Test Result	Sick	Healthy	Total	Probability
Positive	170	7830	8000	170/8000 = 0.02125
Negative	30	91970	92000	30/92000 = 0.000326087
Total	200	99800	100000

What is $P(Sick | +Test)$?

We know that $P(Sick | +Test) = \dfrac{P(+{Test} | S) \times P(Sick)}{P(+{Test})}$

We know that $P(+{Test} | Sick) = 0.85$, that $P(Sick) = 0.002$, and that $P(+{Test}) = 0.08$

Therefore, $P(Sick | +Test) = \dfrac{0.85 \times 0.002}{0.08} = 0.021$ … There is a 2.1% chance the patient is actually sick if the test comes back positive.

What if the test comes back as a false positive … i.e., What is the chance the patient is actually healthy but the test is positive? This requires calculating $P(+Test | \text{ Not Sick }) = \dfrac{P(\text{Healthy } | +{Test}) \times P(+Test)}{P(Healthy)} = \dfrac{1 - 0.021 \times (0.08)}{1 - 0.002} = 0.078$ or 7.8%; a healthy patient can still test positive for the disease 7.8% of the time.

5.12.2 Extending Bayes Rule

Bayes theorem is not limited to simple $2 \times 2$ contingency tables, as demonstrated by the example below. An aircraft emergency locator transmitter (ELT) is a device designed to transmit a signal in the case of a crash. The Altigauge Manufacturing Company makes 80% of the ELTs, the Bryant Company makes 15% of them, and the Chartair Company makes the other 5%. The ELTs made by Altigauge have a 4% rate of defects, the Bryant ELTs have a 6% rate of defects, and the Chartair ELTs have a 9% rate of defects. What is $P(Altigauge|Defective)$?

TABLE 5.14: Extending Bayes Theorem
Manufacturer	Defective	Not Defective	Total	Probability
Altigauge	320	7680	8000	320/455 = 0.7032967
Bryant	90	1410	1500	90/455 = 0.1978022
Chartair	45	455	500	45/455 = 0.0989011
Total	455	9545	10000

\[\begin{eqnarray*} P(A|D) = \dfrac{P(D|A) \times P(A)}{\left[P(D|A) \times P(A) \right] + \left[P(D|B) \times P(B) \right] + \left[P(D|C) \times P(C) \right]} \\ = \dfrac{0.80 \times 0.04}{\left[ 0.80 \times 0.04 \right] + \left[0.15 \times 0.06 \right] + \left[0.05 \times 0.09 \right]} \\ = \dfrac{0.0320}{0.0320 + 0.0090 + 0.0045} = \dfrac{0.0320}{0.0455} = 0.7032967 \end{eqnarray*}\]

5.13 Key Things to Remember

The Addition Rule
1. Non-Mutually Exclusive Events: $P(\text{A or B}) = P(A) + P(B) - P(\text{A and B})$
2. Mutually Exclusive Events: $P(\text{A or B}) = P(A) + P(B)$
Multiplication Rule
1. Independent Events: $P(\text{A and B}) = P(A) \times P(B)$
2. Dependent Events: $P(\text{A and B}) = P(A) \times P(B | A)$ and $P(\text{B and A}) = P(B) \times P(A | B)$
A and B are mutually exclusive if $P(\text{A and B}) = 0$
A and B are independent if $P(A|B) = P(A)$; $P(B|A) = P(B)$
Total Probability: $P(B) = P(A)\times P(B|A) + P(A^c)\times P(B|A^c)$
Bayes’ Rule: $P(A|B) = \dfrac{P(A)\times P(B|A)}{P(B)}$; and $P(B|A) = \dfrac{P(B)\times P(A|B)}{P(A)}$

5.14 Practice Problems

Problem 1

In the first hour of a hunting trip, the probability that a pride of Serengeti lions will encounter a Cape buffalo is 0.035. If it encounters a buffalo, the probability that the pride successfully captures the buffalo is 0.40. What is the probability that the next one-hour of a hunt the pride will successfully capture a Cape buffalo?

Problem 2

You graduated with a Biology degree and are interviewed for a lucrative job as a snake handler in a circus. As part of the audition you must pick up two rattlesnakes from a pit containing eight rattlesnakes, three of which have been defanged (so they cannot bite you) but the other five are still dangerous (they can bite you). You skipped the herpetology course while in school so you cannot identify the defanged snakes from the others. Luckily for you, you did take a course on probability. As you thank your stars you bend down and pick up one snake with your left hand and another with your right hand.

What is the probability that you picked up no dangerous snakes?
Assuming any of the five dangerous snakes will, if picked up, bite with a probability of 0.8. What is the chance that in picking up two snakes you will be bitten at least once?
If you picked up only one snake and it did not bite you, what is the probability that it is defanged?

Problem 3

Schrodinger’s cat lives under a constant threat of death from the random release of a deadly poisonous gas. The probability of a release of the poison on any given day is 0.01, and releases are independent across days.

What is the probability that the cat will survive one day?
What is the probability that the cat survives seven days?
What is the probability that the cat survives a year (365 days)?
What is the probability that the cat will die by the end of a year?

Problem 4

You are the health inspector for your city and tasked with improving airport washroom hygiene in the city’s lone airport. You know that the probability of a man washing his hands after using the washroom is 0.74 and that of a woman is 0.83. Assume there are 40 men and 60 women waiting to use the washroom. What is the probability that the next person to use the washroom will wash his/her hands?

Problem 5

Rapid HIV tests allow for quick diagnosis without expensive laboratory equipment but their accuracy is in doubt. In a population of 1517 tested individuals in a country, 4 had HIV but tested negative (false negative) with the rapid test, 166 had HIV and tested positive, 129 were HIV free but tested positive (false positive), and 1218 were HIV free and tested negative.

What is the probability of a false positive?
What was the false negative rate?
If a randomly sampled individual from this population tests positive on a rapid test, what is the probability that he/she has HIV?

Problem 6

Taking a group photograph is tricky because someone or the other usually seems to blink just as the photograph is taken. We know the probability of an individual blinking during a photo is 0.04.

If you photograph one person, what is the probability that he/she will not blink?
If you are photographing a group of 10 individuals, what is the probability that at least one person will blink?

Problem 7

In Vancouver (British Columbia), the probability of rain on any given winter’s day is 0.58, on a spring day it is 0.38, on a summer day it is 0.25, and in the fall it is 0.53.

What is the probability of rain on any given randomly chosen day in a year (365 days)?
If I tell you that it rained on a particular day (but I don’t tell you the date), what is the probability that this was a winter’s day?

Problem 8

A standard deck of cards has 52 cards, 13 of each suit Spades, Diamonds, Hearts, and Clubs. Each suite has 13 cards, an Ace, King, Queen, Jack, and nine numbered cards 2 through 10. Face cards are King, Queen, and Jack.

If you draw one card, what is the probability that the card is a King?
What is the probability of drawing a face card that is also a Spade?
What is the probability of drawing a card without a number on it?
What is the probability of drawing a red card? What is the probability of drawing an ace? What is the probability of drawing a red Ace? Are these events (“Ace” and “red”) mutually exclusive? Are they independent?
List two events that are mutually exclusive for a single draw from a standard deck of cards.
What is the probability of drawing a red King? What is the probability of drawing a face card in hearts? Are these two events mutually exclusive? Are they independent?

Problem 9

A boy mentions that none of the 21 kids in his third-grade class has had a birthday since school began 56 days ago. Assume that in the population the probability of having a birthday on any given day is the same for every day of the year (365 days). What is the probability that 21 kids in such a class would not yet have a birthday in 56 days?

Problem 10

The probability of getting a heads or tails on a single flip of a fair coin is 0.5, each.

If you flip a coin twice, what is the probability of getting no heads?
If you flip a coin twice, what is the probability of getting exactly two heads?
If you flip a coin twice, what is the probability of getting exactly one tail?
If you flip a coin twice, what is the probability of getting no tails?

Problem 11

Two dice are rolled, find the probability that the sum of the two faces is

equal to 1
equal to 4
less than 13
greater than 7

Problem 12

The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at random, what is the probability that this person has O blood type?

Problem 13

A box is filled with candies in different colors. We have 40 white candies, 24 green ones, 12 red ones, 24 yellow ones and 20 blue ones. If we have selected one candy from the box without peeking into it, find the probability of getting a green or a red candy.

Problem 14

You go to see the doctor about an ingrowing toenail. The doctor selects you at random to have a blood test for swine flu, which for the purposes of this exercise we will say is currently suspected to affect 1 in 10,000 people in Ohio. The test is 99% accurate, in the sense that the probability of a false positive is 1%. The probability of a false negative is zero. You test positive. What is the new probability that you have swine flu?

Now imagine that you went to a friend’s wedding in Mexico recently, and (for the purposes of this exercise) it is known that 1 in 200 people who visited Mexico recently come back with swine flu. Given the same test result as above, what should your revised estimate be for the probability you have the disease?

Problem 15

In a TV Game show, a contestant selects one of three doors; behind one of the doors there is a prize, and behind the other two there are no prizes. After the contestant selects a door, the game-show host opens one of the remaining doors, and reveals that there is no prize behind it. The host then asks the contestant whether they want to SWITCH their choice to the other unopened door, or STICK to their original choice. Is it probabilistically advantageous for the contestant to SWITCH doors, or is the probability of winning the prize the same whether they STICK or SWITCH? (Assume that the host selects a door to open, from those available, with equal probability).

Problem 16

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this population). Calculate the following probabilities:

that the test result will be positive
that, given a positive result, the person is a sufferer
that, given a negative result, the person is a non-sufferer
that the person will be misclassified

Problem 17

On the first anniversary of the September 11, 2001 terrorist attack on the Twin Towers in New York City, the winning pick-3 number in the city was 911. This lottery involves choosing a three-digit sequence between 000 and 999, with the winning draw determined by numbered balls circulating in a machine.

What is the probability of the winning number being 911 on September 11, 2002 if a single draw is held for the lottery?
In reality there are two draws daily. What is the probability of the winning number being 911 on September 11, 2002 in two draws held for the lottery that day?

Note that in each of these cases we haven’t said what will happen yet we have been able to list all possible outcomes that might occur. This is important because if we omit any likely outcome our sample space would be inaccurate.↩︎