z-scores are essentially transformations of a numeric variable, with the transformation defined by \(z = \dfrac{x - \mu}{\sigma}\). In SPSS you can save a z-score version of a numeric variable by activating the Save standardized values as variables
option when calculating Descriptive Statistics via Descriptives
.
You can easily draw a random sample in SPSS. Using hs1
, for example, if we wanted to draw a sample made up of 2% of the data, we could go to Data
and then choose Select cases…
, click on Random sample of cases
and then enter the sampling rate we want. In the images below I have chosen a 2% sample. If you click through the usual execution sequence you will see 98% of the data will have been filtered out. Now whatever calculation or graphic you run, it will only use the 2% of randomly selected cases.
Once you have identified the variable(s) you wish to test and specified the null and alternative hypotheses, write these down. Now using the Analyze
menu select Compare Means
and then the One-Sample T Test…
as shown below.
You will have to enter the null value, what SPSS calls the Test Value
and the \(\alpha\) via specifying the confidence interval via Options…
. Click OK
and you will see the test results in the output window.
Focus on the SPSS output. The calculated t-statistic is 5.445 and has a p-value of practically 0; SPSS reports p-values as Sig. (2-tailed)
. The mean difference is 1232.20 - 50 = 1182.20, and the associated 95% confidence interval of this mean difference is given by 747.76 and 1616.64. We can interpret this interval as suggesting that we can be about 95% confident that the true population mean value of dh lies between 797.76 and 1666.64 miles.
Explore
to identify the values associated with these extreme values, and to conduct normality tests.
If you look at the output, notice that the highest 5 and lowest 5 values are shown along with the observation numbers. You also see the results of the normality test. The test you want to focus on is the Shapiro-Wilk. Note that the null hypothesis is of normality; these sample data are drawn from a normally distributed population. Because the t-test relies upon normality, we would like to fail to reject this null hypothesis. The actual value of the test statistic here is 0.697 and the p-value is practically 0. Because the p-value is \(\leq 0.05\) we reject the null hypothesis. Typically this is taken as “proof” that the sample data do not come from a normally distributed population, and hence that the one-sample t-test should not be used.
For any categorical variable that has precisely two categories (for example, the student’s sex) we can test whether the observed proportion of a given category is significantly different from that expected by chance. For example, we know that approximately 51% of all births occurring on any given day will result in a male child. So the probability of the next child that is born being a male is 0.51. Given this, we can test if the distribution of male students is different from this population proportion of 0.51? This can be tested via the following options in SPSS
If you look at the output, note the p-value is quite high (it is 0.388). Consequently we fail to reject the null hypothesis that the proportion of males in the sample is \(\geq\) to the population proportion of 0.51. Note also that the way SPSS runs the test it is conducting a one-tailed hypothesis test where the null and alternative are as follows:
\[H_0: \text{Male proportion is} \geq 0.51\] \[H_A: \text{Male proportion is} < 0.51\]
If you had wanted to test the following instead you would have to rely on specially written syntax to do so:
\[H_0: \text{Male proportion is} = 0.51\] \[H_A: \text{Male proportion is} \neq 0.51\]