1 Sampling Distributions

Let us see what it means to draw a random sample from the population and estimate the mean and standard deviation, and then compare this to the population mean \((\mu)\) and population standard deviation \((\sigma)\). We’ll use the hs1 data and assume that the 200 students represent the population.

  1. Calculate the mean and standard deviation of this population of 200 students. Write these two values down.
  2. Now draw a 2% sample and calculate the sample mean \((\bar{x})\) and sample standard deviation \((s)\) and write it down. Also write down the sample size.
  3. Now draw a 5% sample and repeat the previous exercise.
  4. Now draw a 10% sample and repeat the previous exercise.

What is happening to the distance between the sample mean and sample population versus the population mean and population standard deviation?

As your sample size grows, you have a better chance of your sample mean and sample standard deviation approaching the population mean and the population standard deviation.

2 The Standard Error

The standard error is given by \(\sigma_{x} = \dfrac{\sigma}{\sqrt{n}}\). Now, when we need to work with z-scores, we first calculate the standard error and then calculate the z-score as: \(z = \dfrac{x - \mu}{\sigma_{x}}\), and then calculate the area(s) as needed.

3 Interval Estimation: The Confidence Interval

Now assume that the hs1 data represent a random sample of students. When you use Explore to calculate the mean for a particular variable, you see another row in the table that shows the 95% confidence interval. Say you do this for the reading variable. What you see are two values for the lower and upper bound, respectively – 50.8003 and 53.6597. These are the 95% confidence intervals, indicating that if we drew all possible samples of \(n=200\) from the population and calculated the mean of reading scores, and then the 95% confidence intervals of each sample mean, 95% of the resulting confidence intervals would trap the population mean. In everyday usage we tend to just calculate one sample mean and the confidence interval around it and then proclaim “we can be about 95% certain that the population mean (the TRUE mean) lies in the interval given by the lower bound and the upper bound”. Note the difference between theory and practice; in practice we never have the luxury of drawing all possible samples of \(n=200\) and then calculating the mean and the 95% confidence intervals.

4 The \(t\) Distribution

Now, the degrees of freedom \((df)\) come into effect. In brief, \(df = (n - 1)\). Given a sample size, we should use the \(t\) distribution instead of the \(z\).

We can also calculate the t-value that leaves …

Once you calculate the \(t\), you can also calculate the confidence interval:

4.1 Problem 10.3

Last year, sanitation engineer crews in Buffalo (NY) collected 124 tons of trash per day. This year, larger, more efficient trucks were purchased. A sample if 100 truck-days shows that a mean of 130 tons of trash were collected, with a standard deviation of 30 tons. What is the probability of a sample mean of this size being drawn from a population with a mean of 124?

\[\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}} = \dfrac{30}{\sqrt{100}} = \dfrac{30}{10} = 3\] \[df = n - 1 = 100 - 1 = 99\] \[t = \dfrac{x - \mu}{\sigma_{\bar{x}}} = \dfrac{130 - 124}{3} = \dfrac{6}{3}=2\]

What is \(P(t \geq 2)\) with \(df = 99\)?

So the probability of a sample mean this far or farther is 2.41%.

4.2 Problem 10.5

If the absenteeism rate for a school district rises above 10%, the state reduces funding to the school district. Sommerville School District takes a sample of five schools in the district and finds the mean absenteeism rate to be 6.96% and the standard deviation to be 2.11. What is your best estimate of the absenteeism rate in the entire Sommerville District? What is the probability that the absenteeism rate is greater than 10%?

4.3 Problem 10.11

The Department of Health and Human Services wants to know the average income of people who receive federal assistance. A sample of 60 recipients of financial assistance shows the mean to be 17,400 dollars with a standard deviation of 3,150 dollars. What is the probability that the income of those who receive federal financial assistance could be greater than 18,500 dollars? What is the probability that the income of those who receive federal financial assistance could be less than 17,000 dollars?

5 Calculating the Needed Sample Size

The margin of error \((E)\) is given by \(E = z_{\frac{\alpha}{2}} \times \sigma_{\bar{x}} = z_{\frac{\alpha}{2}} \times \dfrac{\sigma}{\sqrt{n}}\). So once we have a margin of error in mind, we can figure out the sample size we need to achieve this margin of error so long as we ….

\[E = z_{\frac{\alpha}{2}} \times \dfrac{\sigma}{\sqrt{n}}\] \[E \times \sqrt{n} = z_{\frac{\alpha}{2}} \times \sigma\] \[\sqrt{n} = \dfrac{(z_{\frac{\alpha}{2}} \times \sigma)}{E}\] \[n = \dfrac{(z_{\frac{\alpha}{2}} \times \sigma)^2}{E^2}\] \[n = \dfrac{(z_{\frac{\alpha}{2}})^2 \times \sigma^2}{E^2}\]

5.1 An Example of Necessary Sample Size

Wedding costs are estimated to have an average of 19,000 dollars with a standard deviation of 9,400 dollars. An analyst wants to be 95% certain and within 1,000 dollars of the true average. How large a sample does this analyst need to meet these requirements?

\[n = \dfrac{(1.96)^2 \times (9400)^2}{(1000)^2} = 339.4437 \approx 340\]

  • How large a sample would this analyst need to be within 500 of the population mean?
  • How large a sample would this analyst need to be within 100 of the population mean?

6 One-sample t-test

6.1 Practice 11.1

Last year Normal (IL) had all car maintenance done by the city motor pool. The cost was 364 dollars per car. The motor pool lost all its employees and this year the city is using a ocal motor repair shop. They want to know if this privatization is saving them money. A random sample of 36 cars shows average repair costs to be 330 dolalrs with a standard deviation of 120 dollars. What can you tell the city officials?

6.2 Problem 11.5

The police chief of Kramer (TX) reads that police clear 46.2 percent of all burglaries that occur in Kramer. She wants to know how good Kramer’s rate is compared to other similarly sized cities in Texas. She gathers burglary clearance rates for a random sample of 10 cities and finds the mean to be 40.34 percent and the standard deviation to be 7.68 percent. Is Kramer’s clearance rate significantly different from the average rate of other similarly sized Texas cities?

6.3 with SPSS

  1. Most people think that some 3% of workers commute to work by bicycle. Setup the null and alternative hypotheses and test this belief with \(\alpha = 0.05\). State your conclusion, as well as the corresponding confidence interval.

  2. Repeat the preceding test with \(\alpha = 0.01\) and state your conclusion, as well as the corresponding confidence interval.

  3. People also believe that some 5% walk to work every day. Test this belief, state your conclusion, and the corresponding confidence interval.

7 The Binomial test

  1. Test whether at least one-half of the population declares itself as a vegetarian. State your conclusion.

  2. Test whether at least one-third of the population believes in life after death. State your conclusion.