The first thing we will do is launch SPSS. The university machines have SPSS 24. Find the relevant SPSS icon and click (or double-click) to launch SPSS. If you see a popup window that says something about “Unicode encoding, blah blah” go ahead and click Use Unicode Encoding.
Data saved in SPSS format have their own file extensions: .sav and .por. If you have an SPSS data file you can simultaneously open it and launch SPSS by double-clicking the data file. Lets do it now with hs1.sav
Go to File > Open > Data…
and in the Open Data dialog box, select the Files of type
that you want to open, select the file you want, and then Click Open
. If you see a follow-up dialogue box that asks about variable names, etc., if your variable names are in the first row of data, select the Read variable names from the first row of data
check box.
So long as your csv or txt or dat file has column delimiters – tabs, commas, spaces, etc. that indicate where one variable finishes and another variable starts, you should have little trouble. If your file lacks this information you will have to manually split the variables (and this is a tedious process). Lets assume we have a clean csv/txt file to work with.
Click File > Read Text Data
and you will see the Open Data
window. By default SPSS will have selected “Text (.txt, .dat, *.csv)“. Locate your file, click the file to select it, and then then click OK
.
Follow the prompts as they appear in the Text Import Wizard
dialogue box. If all goes well and you see the data in SPSS, be sure to save the file as an SPSS file before you end your session. Otherwise you will have to start from scratch the next time around.
You will see two tabs in the SPSS data window, one is the Data View
and the other is the Variable View
.
This is the view you will work with when cleaning your data.
Click
(or double-click depending upon your computer system) on a variable’s name and you can edit this name.
Click
on Type
and you can specify whether this variable should be treated as a numeric variable or a string variable. Other variable types are listed as well but you will rarely use these. Note that you can also specify how “wide” the variable is, and how many decimal places should be displayed.
Here you can enter a description of the variable
Click Values
and you will see a dialogue box pop-up. Here you can map a numeric value to its corresponding label. For example, say I have saved a student’s status (Freshman, Sophomore, etc) as 1 if Freshman, 2 if Sophomore, and so on, I can now enter the value labels. Once I do this every table or chart I create I will be able to see the actually labels rather than cryptic numeric values of 1, 2, etc.
If you have missing data, this allows you to tell SPSS how to distinguish between valid observations versus observations that should be treated as having missing information on this particular variable. The most common missing value you will see will be a dot as in .
but some organizations tend to use the numbers 9, 999, 9999, -9, -99, -9999, etc to flag missing values.
This column gives you a drop-down box that will also you to specify the measurement level for each variable.
The above options will be the ones you use most often so become familiar with them.
The Transform menu will allow you to perform several operations. The ones you will use most often will be either recoding some variable into another, binning (i.e., grouping a numeric variable), or then computing some value for a variable.
If you look at the gender variable in the Agresti and Finlay data, you will see values of f for females and m for males. Ideally we would save this information with numeric codes that are then labeled. For example, we would like to have 1 = Male and 2 = Female. Let us create a new variable, called sex, with this mapping.
To do so, select Recode into Different Variables
and choose the input Variable (the one whose values you will use to create the new variable). Now set a name and label for the Output variable. Then click Old and New Values …
.
Variable View
, click Values
and now map 1 to Male and 2 to Female.If you need to create groups out of a numeric variable, age, for example, Visual Binning
will do this for you quite easily. Let us group age into specific categories. Start by clicking on Visual Binning
and then select the age variable. You will see various attributes of age, including a histogram. The youngest person is 22 and the oldest is 71. Let us see what happens if we create age groups that run as follows: 20 - 30, 30 - 40, 40 - 50, 50+.
Click on Make Cutpoints…
and specify where you want the first cutpoint, the number of groups you want, and then the width you want. Once you do this and click OK
you’ll see how the original variable will be grouped. If you like the result go ahead and save this new variable as grouped_age and Click OK
. Check the Variable View
and you have your new variable. Now go in and create the labels for grouped_age.
SPSS will allow you to create graphs in different ways. A good starting point is to use the Chart Builder
under the Graphs menu. The first thing you will see is a warning message telling you to be sure to have set the measurement levels correctly for your data. Measurement levels determine what sort of graphic can be used for a variable; hence this warning. If all is well with your data, Click OK
.
The resulting dialogue box has two panes, and we’ll start with the lower pane. Here you see the chart Gallery that allows you to select the type of graph you want to build.
Select Bar and then the type of bar chart you want. For now we’ll go with the default bar (the first one you see). Drag the selected bar chart into the upper pane. Now drag the new sex variable you created to the x-axis. You have your basic bar chart.
Note that the y-axis uses the frequency counts by default. You can change this to percentages since they are a lot easier for folks to interpret and make it clear which group dominates, which one has the smallest presence, etc. You can customize the chart by double-clicking
the resulting graph to open various edit functions.
A similar sequence will apply to pie charts, and you can customize these as well.
Select the Histogram instead and use the high school GPA variable. The second dialogue box – Element Properties
that opens up on the left of the Chart Builder box will allow you to superimpose the Normal curve, change the bars to whiskers, etc. At minimum, superimpose the Normal curve. Close this secondary dialogue box and then Click OK
If you choose Scatter/Dot you can build scatterplots with one numeric variable on the x-axis and the second numeric variable on the y-axis. Several customization options are available here as well.
Boxplots can be built with a single numeric variable or by seeing how a numeric variable’s distribution differs between groups flagged by another variable. These graphs can be customized as well.
You will typically have three basic types of tables – (a) frequency tables for a single variable, (b) cross-tabulations where you have two (and rarely three) variables, and a table of (c) summary statistics (mean, median, variance, etc.). In SPSS, you will find options for tables under the Analyze menu.
Go to Analyze
, select Descriptive Statistics
, and then Frequencies
. Select the sex variable we created. On the right hand side of the dialogue box you will see various sub-options. The only ones we want to tweak for now will be Charts…
and Format…
. if you want to generate a bar chart along with the frequency table, you can do so here. Likewise, you can organize the rows of the table by ascending/descending values of the variable, or then by ascending/descending frequency counts.
These tables are useful if you are using a grouped version of a numeric variable (grouped_age, for example) and can be constructed similarly to how the preceding frequency tables were constructed.
Here we have two nominal/ordinal variables as in, for example, grouped_age by the student’s sex. These can be generated via the Crosstabs…
option under Descriptive Statistics
. Select sex as the Row(s):
and grouped_age as the Column(s):
. Make sure you select Display clustered bar charts
before you click OK
.