Outliers in statistical analyses are extreme values that do not seem to fit with the majority of a data set. If not removed, these extreme values can have a large effect on any conclusions that might be drawn from the data in question, because they can skew correlation coefficients and lines of best fit in the wrong direction. SPSS is one of a number of statistical analysis software programs that can be used to interpret a data set and identify and remove outlying values.
- Outliers in statistical analyses are extreme values that do not seem to fit with the majority of a data set.
- SPSS is one of a number of statistical analysis software programs that can be used to interpret a data set and identify and remove outlying values.
Click on "Analyze." Select "Descriptive Statistics" followed by "Explore."
Drag and drop the columns containing the dependent variable data into the box labelled "Dependent List." Click "OK."
Remove any outliers identified by SPSS in the stem-and-leaf plots or box plots by deleting the individual data points. Alternatively, you can set up a filter to exclude these data points.
Select "Data" and then "Select Cases" and click on a condition that has outliers you wish to exclude. Determine a value for this condition that excludes only the outliers and none of the non-outlying data points.
- Remove any outliers identified by SPSS in the stem-and-leaf plots or box plots by deleting the individual data points.
- Select "Data" and then "Select Cases" and click on a condition that has outliers you wish to exclude.
Choose "If Condition is Satisfied" in the "Select" box and then click the "If" button just below it. Enter the rule to exclude outliers that you determined in the previous step into the box at the upper right. For example, if you were excluding measurements above 74.5 inches from the condition "height," you would enter "height < = 74.5." Click "Continue" and "OK" to activate the filter.
In the "Analyze" menu, select "Regression" and then "Linear." Select the dependent and independent variables you want to analyse.
Click "Save" and then select "Cook's Distance." The values calculated for Cook's distance will be saved in your data file as variables labelled "COO-1."
Run a boxplot by selecting "Graphs" followed by "Boxplot." Click on "Simple" and select "Summaries of Separate Variables." Enter "COO-1" into the box labelled "Boxes Represent," and then enter an ID or name by which to identify the cases in the "Label Cases By" box.
- In the "Analyze" menu, select "Regression" and then "Linear."
- Run a boxplot by selecting "Graphs" followed by "Boxplot."
Enlarge the boxplot in the output file by double-clicking it. Make a note of cases that lie beyond the black lines---these are your outliers. You may choose to remove all of the outliers or only the extreme outliers, which are marked by a star (*).
Go back into the data file and locate the cases that need to be erased. Working from the bottom up, highlight the number at the extreme left, in the grey column, so the entire row is selected. Click on "Edit" and select "Clear." Repeat this step for each outlier you have identified from the boxplot.
WARNING
When erasing cases in Section 2, step 5, always work from the bottom of the data file moving up because the ID numbers change when you erase a case. If you work from the top down, you will end up erasing the wrong cases.