Let us see what it means to draw a random sample from the population and estimate the mean and standard deviation, and then compare this to the population mean \((\mu)\) and population standard deviation \((\sigma)\). We’ll use the hs1
data and assume that the 200 students represent the population.
What is happening to the distance between the sample mean and sample population versus the population mean and population standard deviation?
As your sample size grows, you have a better chance of your sample mean and sample standard deviation approaching the population mean and the population standard deviation.
The standard error is given by \(\sigma_{x} = \dfrac{\sigma}{\sqrt{n}}\). Now, when we need to work with z-scores, we first calculate the standard error and then calculate the z-score as: \(z = \dfrac{x - \mu}{\sigma_{x}}\), and then calculate the area(s) as needed.
Now assume that the hs1
data represent a random sample of students. When you use Explore
to calculate the mean for a particular variable, you see another row in the table that shows the 95% confidence interval. Say you do this for the reading variable. What you see are two values for the lower and upper bound, respectively – 50.8003 and 53.6597. These are the 95% confidence intervals, indicating that if we drew all possible samples of \(n=200\) from the population and calculated the mean of reading scores, and then the 95% confidence intervals of each sample mean, 95% of the resulting confidence intervals would trap the population mean. In everyday usage we tend to just calculate one sample mean and the confidence interval around it and then proclaim “we can be about 95% certain that the population mean (the TRUE mean) lies in the interval given by the lower bound and the upper bound”. Note the difference between theory and practice; in practice we never have the luxury of drawing all possible samples of \(n=200\) and then calculating the mean and the 95% confidence intervals.
Now, the degrees of freedom \((df)\) come into effect. In brief, \(df = (n - 1)\). Given a sample size, we should use the \(t\) distribution instead of the \(z\).
We can also calculate the t-value that leaves …
Once you calculate the \(t\), you can also calculate the confidence interval:
Last year, sanitation engineer crews in Buffalo (NY) collected 124 tons of trash per day. This year, larger, more efficient trucks were purchased. A sample if 100 truck-days shows that a mean of 130 tons of trash were collected, with a standard deviation of 30 tons. What is the probability of a sample mean of this size being drawn from a population with a mean of 124?
\[\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}} = \dfrac{30}{\sqrt{100}} = \dfrac{30}{10} = 3\] \[df = n - 1 = 100 - 1 = 99\] \[t = \dfrac{x - \mu}{\sigma_{\bar{x}}} = \dfrac{130 - 124}{3} = \dfrac{6}{3}=2\]
What is \(P(t \geq 2)\) with \(df = 99\)?
So the probability of a sample mean this far or farther is 2.41%.
If the absenteeism rate for a school district rises above 10%, the state reduces funding to the school district. Sommerville School District takes a sample of five schools in the district and finds the mean absenteeism rate to be 6.96% and the standard deviation to be 2.11. What is your best estimate of the absenteeism rate in the entire Sommerville District? What is the probability that the absenteeism rate is greater than 10%?
The Department of Health and Human Services wants to know the average income of people who receive federal assistance. A sample of 60 recipients of financial assistance shows the mean to be 17,400 dollars with a standard deviation of 3,150 dollars. What is the probability that the income of those who receive federal financial assistance could be greater than 18,500 dollars? What is the probability that the income of those who receive federal financial assistance could be less than 17,000 dollars?
The margin of error \((E)\) is given by \(E = z_{\frac{\alpha}{2}} \times \sigma_{\bar{x}} = z_{\frac{\alpha}{2}} \times \dfrac{\sigma}{\sqrt{n}}\). So once we have a margin of error in mind, we can figure out the sample size we need to achieve this margin of error so long as we ….
\[E = z_{\frac{\alpha}{2}} \times \dfrac{\sigma}{\sqrt{n}}\] \[E \times \sqrt{n} = z_{\frac{\alpha}{2}} \times \sigma\] \[\sqrt{n} = \dfrac{(z_{\frac{\alpha}{2}} \times \sigma)}{E}\] \[n = \dfrac{(z_{\frac{\alpha}{2}} \times \sigma)^2}{E^2}\] \[n = \dfrac{(z_{\frac{\alpha}{2}})^2 \times \sigma^2}{E^2}\]
Wedding costs are estimated to have an average of 19,000 dollars with a standard deviation of 9,400 dollars. An analyst wants to be 95% certain and within 1,000 dollars of the true average. How large a sample does this analyst need to meet these requirements?
\[n = \dfrac{(1.96)^2 \times (9400)^2}{(1000)^2} = 339.4437 \approx 340\]
Last year Normal (IL) had all car maintenance done by the city motor pool. The cost was 364 dollars per car. The motor pool lost all its employees and this year the city is using a ocal motor repair shop. They want to know if this privatization is saving them money. A random sample of 36 cars shows average repair costs to be 330 dolalrs with a standard deviation of 120 dollars. What can you tell the city officials?
The police chief of Kramer (TX) reads that police clear 46.2 percent of all burglaries that occur in Kramer. She wants to know how good Kramer’s rate is compared to other similarly sized cities in Texas. She gathers burglary clearance rates for a random sample of 10 cities and finds the mean to be 40.34 percent and the standard deviation to be 7.68 percent. Is Kramer’s clearance rate significantly different from the average rate of other similarly sized Texas cities?
Most people think that some 3% of workers commute to work by bicycle. Setup the null and alternative hypotheses and test this belief with \(\alpha = 0.05\). State your conclusion, as well as the corresponding confidence interval.
Repeat the preceding test with \(\alpha = 0.01\) and state your conclusion, as well as the corresponding confidence interval.
People also believe that some 5% walk to work every day. Test this belief, state your conclusion, and the corresponding confidence interval.
Test whether at least one-half of the population declares itself as a vegetarian. State your conclusion.
Test whether at least one-third of the population believes in life after death. State your conclusion.