Say you are flipping a fair coin twice. You know you can end up with Heads or Tails on any given toss. The sample space would then be \(S = \left\{ HH, HT, TH, TT \right\}\), i.e., the sample size is \(n=4\). The probability of getting exactly one Tails is \(\dfrac{2}{4} = 0.5\) and so is the probability of one Heads. What if we flipped the coin ten times? What would be the probability of exactly three Tails in ten flips of the coin? We could spell out the full sample space and then calculate the needed probability but that would be tedious. It would be easier to use a formula that does this calculation for us.
\[ \begin{align} P\left[X \text{ successes}\right] = \binom{n}{X}p^{X}\left(1 - p\right)^{n-X} \\ & \text{where } \binom{n}{x} = \dfrac{n!}{X!(n-X)!} \text{ and } \\ & n! = n \times (n-1) \times (n-2) \times \cdots \times 2 \times 1 \end{align} \]
Let us define Success as a Tails resulting on the flip, and X being 3 Tails in ten flips of the coin (i.e., \(n=10\)). Let us also assume that it is a fair coin so that the probability of Tails on any single flip of the coin is \(p=0.5\). Plugging these values into the equation will yield:
\[ \begin{align} P\left[3 \text{ successes}\right] = \binom{10}{3}0.5^{3}\left(1 - 0.5 \right)^{10-3} \\ & = \dfrac{10 \times 9 \times 8 \times \cdots \times 1}{\left(3 \times 2 \times 1 \right) \left(7 \times 6 \times \cdots \times 1 \right)} \left(0.5 \right)^{3} \left(0.5 \right)^{7} \\ & = \dfrac{10 \times 9 \times 8 }{\left(3 \times 2 \times 1 \right)} \left(0.5 \right)^{3} \left(0.5 \right)^{7} \\ & = \left(10 \times 3 \times 4 \right)\left(0.5 \right)^{3} \left(0.5 \right)^{7} \\ & = \left(120\right) \left(0.125 \right) \left(0.0078125 \right) \\ & = 0.1171875 \end{align} \]
Using R as a calculator we could solve the above formula as follows:
n = 10
x = 3
p = 0.5
p.3tails = choose(n, x) * (p)^(x) * (1 - p)^{
n - x
}
p.3tails
## [1] 0.1171875
What about three or more Tails? We could calculate the the probability of exactly three Tails, four Tails, and so on through ten Tails, add these up, and we would have our answer … all using the formula above. An R function, rbinom()
, can do these calculations for us quite easily. Below we do so for the example above.
set.seed(13579)
n = 1e+07
x = rbinom(n, 10, p = 0.5)
tab.x = table(x)
tab.x
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
9736 | 97255 | 438778 | 1169791 | 2051968 | 2462822 | 2050665 | 1171457 | 439391 | 98260 | 9877 |
ptab.x = (tab.x/n)
ptab.x
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
0.0009736 | 0.0097255 | 0.0438778 | 0.1169791 | 0.2051968 | 0.2462822 | 0.2050665 | 0.1171457 | 0.0439391 | 0.009826 | 0.0009877 |
plot(ptab.x, ylab = "Probability", xlab = "No. of Tails", xlim = c(0,
10), ylim = c(0, 0.25), col = "firebrick")
Note: If it is a fair coin then half the flips result in Tails … this is the most likely outcome. Say you flip the coin 10 times and you end up with only 1 Tails. Could this by chance? Sure, because we can see that this could happen with some non-zero probability, just as we could also end up with no Tails or even with ten Tails! Strange outcomes can result, even if they occur very rarely, and by chance alone. As a result, we have to generate some rules that can guide us in determining when to suspect that something is wrong because of an unusual result and when to determine that the result isn’t very unusual after all.
Standard Error: \(SE_{\hat{p}} = \sqrt{\dfrac{\hat{p}\left(1-\hat{p} \right)}{n-1}}\)
For the Agresti-Coull confidence interval:
First calculate \(p^{'} = \dfrac{X+2}{n+4}\)
CI is then given by: \(p^{'} - z\sqrt{ \dfrac{p^{'} \left(1-p^{'} \right) } {n+4} } < p < p^{'} + z\sqrt{ \dfrac{p^{'} \left(1-p^{'} \right) } {n+4} }\)
Alternatively, use the binom
library as shown below:
library(binom)
# binom.test(x, n, p=?, alternative='?', conf.level=?)
# binom.confint(x, n, p=?, alternative='?', conf.level=?,
# methods = 'ac')
In 1955, John Wayne played Genghis Khan in The Conqueror. The movie was filmed downwind from a site where 11 overground nuclear bomb tests had been conducted. Of the 220 crew on the movie, 91 ended up being diagnosed with cancer by the early 1980s. Population data record that only about 14% of people in the same age-group as the movie crew should, on average, have been diagnosed with cancer within the same time frame of exposure.
n = 220
x = 91
p.sample = x/n
p.sample
## [1] 0.4136364
The best estimate is 0.4136364
H0: The cancer rate for the movie crew is the same as that of the population (p = 0.14)
HA: The cancer rate for the movie crew is NOT the same as that of the population (p is not equal to 0.14)
library(binom)
binom.test(x, n, p = 0.14, alternative = "two.sided", conf.level = 0.95)
##
## Exact binomial test
##
## data: x and n
## number of successes = 91, number of trials = 220, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.14
## 95 percent confidence interval:
## 0.3478469 0.4817857
## sample estimates:
## probability of success
## 0.4136364
binom.confint(x, n, p = 0.14, alternative = "two.sided", conf.level = 0.95,
methods = "ac")
method | x | n | mean | lower | upper |
---|---|---|---|---|---|
agresti-coull | 91 | 220 | 0.4136364 | 0.3505683 | 0.4796687 |
The P-value is practically 0 so we can easily reject the null hypothesis (H0). The data suggest that the cancer rate for the movie crew was different from that of the population.
library(titanic)
data(titanic_train) # load data from titanic package
df = titanic_train # save it as df
names(df) # see variables in the data-set
## [1] "PassengerId" "Survived" "Pclass" "Name" "Sex"
## [6] "Age" "SibSp" "Parch" "Ticket" "Fare"
## [11] "Cabin" "Embarked"
VARIABLE DESCRIPTIONS:
SPECIAL NOTES:
Pclass is a proxy for socio-economic status (SES) – 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower
Age is in Years – Fractional if Age less than One (1). If the Age is Estimated, it is in the form xx.5
With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored. The following are the definitions used for sibsp and parch.
Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic
Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)
Parent: Mother or Father of Passenger Aboard Titanic
Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic
Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them. As well, some travelled with very close friends or neighbors in a village, however, the definitions do not support such relations.
Let us clean it up a bit by labeling Survived
, Pclass
and the port of embarkation (Embarked)
as follows:
df$Survived = factor(df$Survived, levels = c(0, 1), labels = c("Died",
"Survived"))
df$Pclass = factor(df$Pclass, levels = c(1, 2, 3), labels = c("1st Class",
"2nd Class", "3rd Class"))
df$Embarked = factor(df$Embarked, labels = c(NA, "Cherbourg",
"Queenstown", "Southampton"))
Now let us ask ourselves: Ignoring all other information about a passenger (sex, age, etc.), what was the probability of survival of a passenger on the Titanic? A simple frequency table will show this to us.
tab.1 = table(df$Survived)
tab.1
Died | Survived |
---|---|
549 | 342 |
tab.2 = prop.table(tab.1)
tab.2
Died | Survived |
---|---|
0.6161616 | 0.3838384 |
The base probability of survival was 0.3838.
Let us assume we were told there was a 50:50 chance of survival on any passenger ship at the time that the Titanic sailed to sea. How would we test whether passengers on the Titanic had a statistically significant better or worse survival rate than that for all passenger ships? Why, with a simple hypothesis test of course.
\(H_0:\) Probability of survival was 0.50
\(H_A:\) Probability of survival was NOT 0.50
x = 382 # number who survived
n = 891 # total sample size
library(binom)
binom.test(x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95)
##
## Exact binomial test
##
## data: x and n
## number of successes = 382, number of trials = 891, p-value =
## 2.359e-05
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.3959570 0.4619793
## sample estimates:
## probability of success
## 0.4287318
binom.confint(x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95,
methods = c("ac"))
method | x | n | mean | lower | upper |
---|---|---|---|---|---|
agresti-coull | 382 | 891 | 0.4287318 | 0.3966092 | 0.4614662 |
The p-value is 2.359e-05, i.e., 0.00002359, way smaller than \(\alpha=0.05\) and so we can easily reject the null hypothesis. The Agresti-Coull confidence interval indicates that we can be about 95% confident that the true probability of survival ranges between 0.3966 and 0.4614.
The \(\chi^2\) distribution is used with multinomial data (i.e., when the categorical variable has more than two categories) to test whether observed frequency counts differ from expected frequency counts.
\(H_0\): Proportions are all the same
\(H_A\): Proportions are all the same
\[\chi^{2} = \displaystyle\sum\limits_{i}\dfrac{\left(Observed_i - Expected_i \right)^2}{Expected_i}\]
\(\chi^2\) distributed with \(\left( \text{no. of categories}-1 \right)\) degrees of freedom \((\textit{df})\)
Reject \(H_0\) if \(P-value \leq \alpha\); Do not reject \(H_0\) otherwise
If the answer is yes, then we should see roughly one-third of the passengers embarking at each of the ports. This can be easily tested as follows:
\(H_0:\) Embarkation was equally distributed across the three ports
\(H_A:\) Embarkation was NOT equally distributed across the three ports
tab.E = table(df$Embarked)
tab.E
NA | Cherbourg | Queenstown | Southampton |
---|---|---|---|
2 | 168 | 77 | 644 |
Xsq.1 = chisq.test(tab.E)
Xsq.1
##
## Chi-squared test for given probabilities
##
## data: tab.E
## X-squared = 1124.2, df = 3, p-value < 2.2e-16
ls(Xsq.1)
## [1] "data.name" "expected" "method" "observed" "p.value" "parameter"
## [7] "residuals" "statistic" "stdres"
Xsq.1$expected
## <NA> Cherbourg Queenstown Southampton
## 222.75 222.75 222.75 222.75
The p-value is practically 0 so we can easily reject the null hypothesis; the data suggest that embarkation loads were not equally distributed across the three ports.
\(H_0:\) Survival was independent of a passenger’s sex
\(H_0:\) Survival was NOT independent of a passenger’s sex
tab.ss = table(df$Survived, df$Sex)
tab.ss
/ | female | male |
---|---|---|
Died | 81 | 468 |
Survived | 233 | 109 |
Xsq.2 = chisq.test(tab.ss)
Xsq.2
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tab.ss
## X-squared = 260.72, df = 1, p-value < 2.2e-16
ls(Xsq.2)
## [1] "data.name" "expected" "method" "observed" "p.value" "parameter"
## [7] "residuals" "statistic" "stdres"
Xsq.2$expected
female | male | |
---|---|---|
Died | 193.4747 | 355.5253 |
Survived | 120.5253 | 221.4747 |
Recall that the data must square with the assumptions of the \(\chi^2\) test. If any of these is violated then we can either collapse some of the categories (if doing so doesn’t inject noise into the distribution) or then opt for another test. These days, though, your best bet would be to ditch the chi-square test altogether and instead just go with Fisher’s Exact test. These days this test can be run with as large a sample as you may end up with.
fisher.test(tab.ss)
##
## Fisher's Exact Test for Count Data
##
## data: tab.ss
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.0575310 0.1138011
## sample estimates:
## odds ratio
## 0.08128333
If we are uncomfortable with ignoring violations of the normality assumption we have no choice but to transform the data. Several transformations are available for use, each with the goal of reducing the skew in the sample (i.e., rendering it more “normally” distributed).
The log transformation is most useful when:
Arcsine Transformation If data are proportions the arcsine: \(arcsin\left( \sqrt{p}\right)\) is suggested but I would not do this readily … Proportions are essentially coming from your Binomial or Poisson distributions and in these cases we would rarely run a \(t-test\) or fit a linear regression
Other Transformations The square \((X^{'} = X^{2})\) is used f the data are skewed left, and if this fails, then the anti-log might work \((X^{'} = e^{X})\).
If data are skewed right, the reciprocal may also work \((X^{'} = \frac{1}{X})\).
For the square and the reciprocal you need to have all measurements be positive. If they are negative then you need to multiple each by \(-1\) and then carry out the transformation.
If we use a transformation, the estimated statistics have to be back-transformed to the original metric.
The Corresponding Back-transformations
Non-parametric tests are used (a) when transformations do not work, or (b) the data represent ordinal categories (or are ranked data)
Assumption: Samples are drawn independently from a continuous distribution
The process by which a single species splits into two species is still not well understood. One proposal involves ``sexual conflict’’ – a genetic arms race between males and females that arises from their different reproductive roles. For example, in a number of insect species, male seminal fluid contains chemicals that reduce the tendency of the female to mate again with other males. However, these chemicals also reduce female survival, so females have developed mechanisms to counter these chemicals. Sexual conflict can cause rapid genetic divergence between isolated populations of the same species, leading to the formation of new species. Sexual conflict is more pronounced in species in which females mate more than once, leading to the prediction that they should form new species at a more rapid rate.
To investigate this, Arnqvist et al. (2000) identified 25 insect taxa (groups) in which females mate multiple times, and they paired each of these groups to a closely related insect group in which females only mate once. Which type of insects tend to have more species? Note: These are treated as paired data because the two sets of groups are closely related
taxa = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e4SexualConflict.csv"))
head(taxa)
taxonPair | nSpeciesMultipleMating | nSpeciesSingleMating | difference |
---|---|---|---|
A | 53 | 10 | 43 |
B | 73 | 120 | -47 |
C | 228 | 74 | 154 |
D | 353 | 289 | 64 |
E | 157 | 30 | 127 |
F | 300 | 4 | 296 |
names(taxa)
## [1] "taxonPair" "nSpeciesMultipleMating"
## [3] "nSpeciesSingleMating" "difference"
summary(taxa)
taxonPair | nSpeciesMultipleMating | nSpeciesSingleMating | difference | |
---|---|---|---|---|
A : 1 | Min. : 7 | Min. : 4.0 | Min. : -980.0 | |
B : 1 | 1st Qu.: 34 | 1st Qu.: 10.0 | 1st Qu.: -3.0 | |
C : 1 | Median : 77 | Median : 30.0 | Median : 16.0 | |
D : 1 | Mean : 1134 | Mean : 263.5 | Mean : 870.5 | |
E : 1 | 3rd Qu.: 228 | 3rd Qu.: 115.0 | 3rd Qu.: 79.0 | |
F : 1 | Max. :21000 | Max. :3500.0 | Max. :20940.0 | |
(Other):19 | NA | NA | NA |
par(mfrow = c(1, 2))
hist(taxa$difference, breaks = 9, col = "firebrick", main = "Histogram of Difference in Species Number",
xlab = "", cex.main = 0.9)
qqnorm(taxa$difference, col = "blue")
qqline(taxa$difference, col = "red")
par(mfrow = c(1, 1))
shapiro.test(taxa$difference)
##
## Shapiro-Wilk normality test
##
## data: taxa$difference
## W = 0.25473, p-value = 2.911e-10
The difference reflects negative values so what do we do? We can try the following:
log.diff = log(taxa$difference + (2 - min(taxa$difference)))
summary(log.diff)
boxplot(log.diff, horizontal = TRUE)
shapiro.test(log.diff)
That didn’t work. We could try some R packages designed to help transform the data but that would be pretty useless. Consequently, we might as well opt for the Sign Test.
\(H_0\): Median difference in number of species is \(= 0\)
\(H_A\): Median difference in number of species is \(\neq 0\)
Let us use the BSDA library that has the test built-in. The commands below assume the data are from two groups, not the same group measured twice. This is in fact the case in our example so we are on the right track.
library(BSDA)
SIGN.test(taxa$nSpeciesMultipleMating, taxa$nSpeciesSingleMating,
md = 0, alternative = "two.sided", conf.level = 0.95)
##
## Dependent-samples Sign-Test
##
## data: taxa$nSpeciesMultipleMating and taxa$nSpeciesSingleMating
## S = 18, p-value = 0.04329
## alternative hypothesis: true median difference is not equal to 0
## 95 percent confidence interval:
## 1.00000 77.16674
## sample estimates:
## median of x-y
## 16
Conf.Level | L.E.pt | U.E.pt | |
---|---|---|---|
Lower Achieved CI | 0.8922 | 1 | 70.0000 |
Interpolated CI | 0.9500 | 1 | 77.1667 |
Upper Achieved CI | 0.9567 | 1 | 78.0000 |
Given the p-value we can reject \(H_0\); the Median difference is not \(=0\). The data suggest that groups of insects whose females mate multiple times have more species than groups whose females mate singly, consistent with the sexual-conflict hypothesis.
If we had the same units measured twice we would have run the command slightly differently.
SIGN.test(taxa$difference, md = 0, alternative = "two.sided",
conf.level = 0.95)
##
## One-sample Sign-Test
##
## data: taxa$difference
## s = 18, p-value = 0.04329
## alternative hypothesis: true median is not equal to 0
## 95 percent confidence interval:
## 1.00000 77.16674
## sample estimates:
## median of x
## 16
Conf.Level | L.E.pt | U.E.pt | |
---|---|---|---|
Lower Achieved CI | 0.8922 | 1 | 70.0000 |
Interpolated CI | 0.9500 | 1 | 77.1667 |
Upper Achieved CI | 0.9567 | 1 | 78.0000 |
Let us also see how to run the Sign test manually: First thing we will do is count how many cases are \(+\), how many are \(-\), and how many are equal to the hypothesized median of \(0\)
taxa$sign = "equal to median"
taxa$sign[taxa$difference > 0] = "plus"
taxa$sign[taxa$difference < 0] = "minus"
taxa$sign = factor(taxa$sign, levels = c("minus", "equal to median",
"plus"))
kable(taxa)
taxonPair | nSpeciesMultipleMating | nSpeciesSingleMating | difference | sign |
---|---|---|---|---|
A | 53 | 10 | 43 | plus |
B | 73 | 120 | -47 | minus |
C | 228 | 74 | 154 | plus |
D | 353 | 289 | 64 | plus |
E | 157 | 30 | 127 | plus |
F | 300 | 4 | 296 | plus |
G | 34 | 18 | 16 | plus |
H | 3400 | 3500 | -100 | minus |
I | 20 | 1000 | -980 | minus |
J | 196 | 486 | -290 | minus |
K | 1750 | 660 | 1090 | plus |
L | 55 | 63 | -8 | minus |
M | 37 | 115 | -78 | minus |
N | 100 | 30 | 70 | plus |
O | 21000 | 60 | 20940 | plus |
P | 37 | 40 | -3 | minus |
Q | 7 | 5 | 2 | plus |
R | 15 | 7 | 8 | plus |
S | 18 | 6 | 12 | plus |
T | 240 | 13 | 227 | plus |
U | 15 | 14 | 1 | plus |
V | 77 | 16 | 61 | plus |
W | 15 | 14 | 1 | plus |
X | 85 | 6 | 79 | plus |
Y | 86 | 8 | 78 | plus |
Here is a simple frequency of pluses and minuses
table(taxa$sign)
minus | equal to median | plus |
---|---|---|
7 | 0 | 18 |
So we have 7 minuses and 18 pluses. If the distribution were evenly spread we should have seen 50% of the cases below and 50% above the median. This is not what we see, but is this chance or something systematic? Let us apply the binomial test and see.
binom.test(7, 25, p = 0.5)
##
## Exact binomial test
##
## data: 7 and 25
## number of successes = 7, number of trials = 25, p-value = 0.04329
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.1207167 0.4938768
## sample estimates:
## probability of success
## 0.28
Note the p-value. We can reject \(H_0\); the Median difference is not \(=0\). T
The Sign Test is weak in that it does not take into account the magnitudes of how much above/below the median an observation falls. The Wilcoxon Signed-Rank Test is a slight improvement because it ranks observations on the basis of how far below/above they fall from the hypothesized median.
Let us run the test on the Sexual Conflict data:
wilcox.test(taxa$nSpeciesMultipleMating, taxa$nSpeciesSingleMating,
paired = TRUE)
## Warning in wilcox.test.default(taxa$nSpeciesMultipleMating, taxa
## $nSpeciesSingleMating, : cannot compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: taxa$nSpeciesMultipleMating and taxa$nSpeciesSingleMating
## V = 230, p-value = 0.07139
## alternative hypothesis: true location shift is not equal to 0
Do not reject \(H_0\); the distributions appear to be the same (i.e., there is no difference in the number of species between the polyandrous and the monandrous groups).
If we have a two-sample (i.e., unpaired) design, then the Mann-Whitney U-Test can be used assuming we couldn’t transform the data to run a \(t-test\).
The sage cricket, Cyphoderris strepitans, has an unusual form of mating. During mating, the male offers his fleshy hind wings to the female to eat. The wounds are not fatal but a male with already nibbled wings is less likely to be chosen by females he meets later. Females get some nutrition from feeding on the wings, which raises the question, ``Are females more likely to mate if they are hungry?’’
Johnson et al. (1999) answered this question by randomly dividing 24 females into two groups: one group of 11 females were starved for at least two days and another group of 13 females was fed during the same period. Finally, each female was put separately into a cage with a single (new) male, and the waiting time to mating was recorded.
\(H_0\): Time to mating is the same for female crickets that were starved as for those who were fed
\(H_A\): Time to mating is not the same for female crickets that were starved as for those who were fed
cric = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))
head(cric)
feedingStatus | timeToMating |
---|---|
starved | 1.9 |
starved | 2.1 |
starved | 3.8 |
starved | 9.0 |
starved | 9.6 |
starved | 13.0 |
names(cric)
## [1] "feedingStatus" "timeToMating"
wilcox.test(cric$timeToMating ~ cric$feedingStatus, paired = FALSE)
##
## Wilcoxon rank sum test
##
## data: cric$timeToMating by cric$feedingStatus
## W = 88, p-value = 0.3607
## alternative hypothesis: true location shift is not equal to 0
Do not reject \(H_0\); time to mating does not appear to be different for starved versus fed female crickets
Note the warning about ties; If we have any ties we need to switch to a version of the Wilcoxon test designed to handle ties. I’ll load the Stalkies data first.
gestures = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/review2/rev2q13BlindGestures.csv"))
head(gestures)
sightedness | numberOfGestures |
---|---|
Blind | 0 |
Blind | 1 |
Blind | 1 |
Blind | 2 |
Blind | 1 |
Blind | 1 |
names(gestures)
## [1] "sightedness" "numberOfGestures"
wilcox.test(gestures$numberOfGestures ~ gestures$sightedness,
paired = FALSE)
## Warning in wilcox.test.default(x = c(0L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: gestures$numberOfGestures by gestures$sightedness
## W = 62.5, p-value = 0.5755
## alternative hypothesis: true location shift is not equal to 0
library(exactRankTests)
wilcox.exact(gestures$numberOfGestures ~ gestures$sightedness,
paired = FALSE)
##
## Exact Wilcoxon rank sum test
##
## data: gestures$numberOfGestures by gestures$sightedness
## W = 62.5, p-value = 0.6361
## alternative hypothesis: true mu is not equal to 0
This (very weak) test is used to compare the distributions of two groups by comparing the empirical cumulative distribution functions (ecdfs) of the two groups and finding the greatest absolute distance between the two
The stalk-eyed fly, Crytodiopsis dalmanni, is a bizarre-looking insect from the jungles of Malaysia. Its eyes are at the ends of long stalks that emerge from its head, making the fly look like something from the cantina scene in Star Wars. These eye stalks are present in both sexes, but they are particularly impressive in males. The span of the eye stalk in males enhances their attractiveness to females as well as their success in battles against other males. The span, in millimeters, from one eye to the other, was measured in a random sample of 45 male stalk-eyed flies. Does diet have
Let us plot the distribution and run the formal tests to see if we can stick with the usual \(t-test\)
eyes = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12q18StalkieEyespan.csv"))
head(eyes)
food | eyeSpan |
---|---|
Corn | 2.15 |
Corn | 2.14 |
Corn | 2.13 |
Corn | 2.13 |
Corn | 2.12 |
Corn | 2.11 |
names(eyes)
## [1] "food" "eyeSpan"
boxplot(eyes$eyeSpan ~ eyes$food, horizontal = TRUE)
var.test(eyes$eyeSpan ~ eyes$food)
##
## F test to compare two variances
##
## data: eyes$eyeSpan by eyes$food
## F = 0.068748, num df = 20, denom df = 23, p-value = 9.646e-08
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.0291719 0.1663771
## sample estimates:
## ratio of variances
## 0.068748
library(car)
leveneTest(eyes$eyeSpan, eyes$food, center = "mean")
Df | F value | Pr(>F) | |
---|---|---|---|
group | 1 | 14.76323 | 0.0003965 |
43 | NA | NA |
leveneTest(eyes$eyeSpan, eyes$food, center = "median")
Df | F value | Pr(>F) | |
---|---|---|---|
group | 1 | 14.7773 | 0.0003943 |
43 | NA | NA |
The box-plots and the formal tests (at least Levene’s test for homogeneity of variances) suggest we should reject the null of equal variances. We could try and normalize the data via the natural logarithm.
eyes$log.span = log(eyes$eyeSpan)
boxplot(eyes$log.span ~ eyes$food, horizontal = TRUE)
var.test(eyes$log.span ~ eyes$food)
##
## F test to compare two variances
##
## data: eyes$log.span by eyes$food
## F = 0.040201, num df = 20, denom df = 23, p-value = 7.154e-10
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.01705872 0.09729156
## sample estimates:
## ratio of variances
## 0.04020145
leveneTest(eyes$log.span, group = eyes$food, center = "mean")
Df | F value | Pr(>F) | |
---|---|---|---|
group | 1 | 19.99886 | 5.58e-05 |
43 | NA | NA |
leveneTest(eyes$log.span, group = eyes$food, center = "median")
Df | F value | Pr(>F) | |
---|---|---|---|
group | 1 | 18.63644 | 9.13e-05 |
43 | NA | NA |
Now we have a quandary, using both the mean and the median tells us we must reject \(H_0\). A safe approach would be to assume we cannot normalize the data so the Kolmogorov-Smirnov test may be called for.
ks.test(eyes$food, eyes$eyeSpan, alternative = "two.sided")
## Warning in ks.test(eyes$food, eyes$eyeSpan, alternative = "two.sided"):
## cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: eyes$food and eyes$eyeSpan
## D = 0.46667, p-value = 0.0001109
## alternative hypothesis: two-sided
Uh oh! We have ties so this won’t work. Technically, a permutation test would be useful here but we won’t look at its mechanics until a later point this semester. For now though, we can run the test to close the loop.
library(exactRankTests)
perm.test(eyes$eyeSpan ~ eyes$food, paired = FALSE, alternative = "two.sided",
exact = TRUE)
##
## 2-sample Permutation Test (scores mapped into 1:(m+n) using
## rounded scores)
##
## data: eyes$eyeSpan by eyes$food
## T = 855, p-value = 9.363e-09
## alternative hypothesis: true mu is not equal to 0
Wel, we might as well try the coin
library then.
library(coin)
independence_test(eyes$eyeSpan ~ eyes$food, alternative = "two.sided")
##
## Asymptotic General Independence Test
##
## data: eyes$eyeSpan by eyes$food (Corn, Cotton)
## Z = 5.095, p-value = 3.487e-07
## alternative hypothesis: two.sided
The test tells us we can easily reject \(H_0\); diet appears to have an effect on eye spans.
There are several ways to plot this.
library(Hmisc)
Ecdf(~eyeSpan | food, data = eyes)
The plot above was generated via the Hmisc
library but you could also whip one up by hand. See below for an example.
fed = subset(eyes, food == "Corn")
starved = subset(eyes, food == "Cotton")
plot(ecdf(fed$eyeSpan), do.points = FALSE, verticals = TRUE,
xlim = c(1, 2.2), pch = 46, col = "cornflowerblue", xlab = "Eye Span",
ylab = "Proportion", main = "ECDF of Eye Span (Corn vs. Cotton)")
lines(ecdf(starved$eyeSpan), do.points = FALSE, verticals = TRUE,
pch = 46, col = "salmon")
abline(v = 1.77, col = "gray")
legend(1, 0.8, legend = c("Corn", "Cotton"), col = c("cornflowerblue",
"salmon"), lwd = 1, cex = 0.75)