Principal Components Analysis with SPSS Karl L. Wuensch Dept of Psychology

Principal Components
Analysis with SPSS
Karl L. Wuensch
Dept of Psychology
East Carolina University
When to Use PCA
• You have a set of p continuous variables.
• You want to repackage their variance into
m components.
• You will usually want m to be < p, but not
always.
Components and Variables
• Each component is a weighted linear
combination of the variables
Ci  Wi 1 X1  Wi 2 X 2    Wip X p
• Each variable is a weighted linear
combination of the components.
X j  A1 j C1  A2 j C2    Amj Cm
Factors and Variables
• In Factor Analysis, we exclude from the
solution any variance that is unique, not
shared by the variables.
X j  A1 j F1  A2 j F2    Amj Fm  U j
• Uj is the unique variance for Xj
Goals of PCA and FA
• Data reduction.
• Discover and summarize pattern of
intercorrelations among variables.
• Test theory about the latent variables
underlying a set a measurement variables.
• Construct a test instrument.
• There are many others uses of PCA and
FA.
Data Reduction
• Ossenkopp and Mazmanian (Physiology
and Behavior, 34: 935-941).
• 19 behavioral and physiological variables.
• A single criterion variable, physiological
response to four hours of cold-restraint
• Extracted five factors.
• Used multiple regression to develop a
model for predicting the criterion from the
five factors.
Exploratory Factor Analysis
• Want to discover the pattern of
intercorrleations among variables.
• Wilt et al., 2005 (thesis).
• Variables are items on the SOIS at ECU.
• Found two factors, one evaluative, one on
difficulty of course.
• Compared FTF students to DE students,
on structure and means.
Confirmatory Factor Analysis
• Have a theory regarding the factor
structure for a set of variables.
• Want to confirm that the theory describes
the observed intercorrelations well.
• Thurstone: Intelligence consists of seven
independent factors rather than one global
factor.
• Often done with SEM software
Construct A Test Instrument
• Write a large set of items designed to test
the constructs of interest.
• Administer the survey to a sample of
persons from the target population.
• Use FA to help select those items that will
be used to measure each of the constructs
of interest.
• Use Cronbach alpha to check reliability of
resulting scales.
An Unusual Use of PCA
• Poulson, Braithwaite, Brondino, and Wuensch
(1997, Journal of Social Behavior and
Personality, 12, 743-758).
• Simulated jury trial, seemingly insane
defendant killed a man.
• Criterion variable = recommended verdict
– Guilty
– Guilty But Mentally Ill
– Not Guilty By Reason of Insanity.
• Predictor variables = jurors’ scores on 8
scales.
• Discriminant function analysis.
• Problem with multicollinearity.
• Used PCA to extract eight orthogonal
components.
• Predicted recommended verdict from
these 8 components.
• Transformed results back to the original
scales.
A Simple, Contrived Example
• Consumers rate importance of seven
characteristics of beer.
– low Cost
– high Size of bottle
– high Alcohol content
– Reputation of brand
– Color
– Aroma
– Taste
• FACTBEER.SAV at
http://core.ecu.edu/psyc/wuenschk/SPSS/
SPSS-Data.htm .
• Analyze, Data Reduction, Factor.
• Scoot beer variables into box.
• Click Descriptives and then check Initial
Solution, Coefficients, KMO and Bartlett’s
Test of Sphericity, and Anti-image. Click
Continue.
• Click Extraction and then select Principal
Components, Correlation Matrix,
Unrotated Factor Solution, Scree Plot, and
Eigenvalues Over 1. Click Continue.
• Click Rotation. Select Varimax and
Rotated Solution. Click Continue.
• Click Options. Select Exclude Cases
Listwise and Sorted By Size. Click
Continue.
• Click OK, and SPSS completes the
Principal Components Analysis.
Checking for Unique Variables 1
• Check the correlation matrix.
• If there are any variables not well
correlated with some others, might as well
delete them.
Checking for Unique Variables 2
Correlation Matrix
cost
size
alcohol
reputat
color
aroma
taste
cost
size
alcohol reputat color
aroma taste
1.00
.832
.767
-.406
.018
-.046
-.064
.832
1.00
.904
-.392
.179
.098
.026
.767
.904
1.00
-.463
.072
.044
.012
-.046
.098
.044
-.443
.909
1.00
.870
-.406
-.392
-.463
1.00
-.372
-.443
-.443
.018
.179
.072
-.372
1.00
.909
.903
-.064
.026
.012
-.443
.903
.870
1.00
Checking for Unique Variables 3
• Bartlett’s test of sphericity tests null that
the matrix is an identity matrix, but does
not help identify individual variables that
are not well correlated with others.
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy.
Bartlett's Test of
Sphericity
Approx. Chi-Square
df
Sig.
.665
1637.9
21
.000
Checking for Unique Variables 4
• For each variable, check R2 between it
and the remaining variables.
• SPSS reports these as the
initial communalities when
you do a principal axis
factor analysis
• Delete any variable with a
low R2 .
Checking for Unique Correlations
• Look at partial correlations – pairs of
variables with large partial correlations
share variance with one another but not
with the remaining variables – this is
problematic.
• Kaiser’s MSA will tell you, for each
variable, how much of this problem exists.
• The smaller the MSA, the greater the
problem.
Checking for Unique Correlations 2
• An MSA of .9 is marvelous, .5 miserable.
• Variables with small MSAs should be
deleted
• Or additional variables added that will
share variance with the troublesome
variables.
Checking for Unique Correlations 3
Anti-image Matrices
cost
Anti-image
Correlation
size
alcohol
reputat
color
aroma
taste
cost
.779a
-.543
.105
.256
.100
.135
-.105
size
-.543
.550a
-.806
-.109
-.495
.061
.435
.105
-.806
.630a
.226
.381
-.060
-.310
.256
-.109
.226
.763a
-.231
.287
.257
color
.100
-.495
.381
-.231
.590a
-.574
-.693
aroma
.135
.061
-.060
.287
-.574
.801a
-.087
-.105
.435
-.310
.257
-.693
-.087
.676a
alcohol
reputat
taste
a. Measures of Sampling Adequacy (MSA) on main diagonal. Off diagonal are partial correlations x -1.
Extracting Principal Components 1
• From p variables we can extract p components.
• Each of p eigenvalues represents the amount of
standardized variance that has been captured
by one component.
• The first component accounts for the largest
possible amount of variance.
• The second captures as much as possible of
what is left over, and so on.
• Each is orthogonal to the others.
Extracting Principal Components 2
• Each variable has standardized variance =
1.
• The total standardized variance in the p
variables = p.
• The sum of the m = p eigenvalues = p.
• All of the variance is extracted.
• For each component, the proportion of
variance extracted = eigenvalue / p.
Extracting Principal Components 3
• For our beer data, here are the
eigenvalues and proportions of variance
for the seven components:
Component
1
2
3
4
5
6
7
Total
3.313
2.616
.575
.240
.134
9.E-02
4.E-02
Initial Eigenvalues
% of
Cumulative
Variance
%
47.327
47.327
37.369
84.696
8.209
92.905
3.427
96.332
1.921
98.252
1.221
99.473
.527
100.000
Extraction Method: Principal Component Analysis.
How Many Components to Retain
• From p variables we can extract p
components.
• We probably want fewer than p.
• Simple rule: Keep as many as have
eigenvalues  1.
• A component with eigenvalue < 1 captured
less than one variable’s worth of variance.
• Visual Aid: Use a Scree Plot
• Scree is rubble at base of cliff.
• For our beer data,
Scree Plot
3.5
3.0
2.5
2.0
1.5
Eigenvalue
1.0
.5
0.0
1
2
Component Number
3
4
5
6
7
• Only the first two components have
eigenvalues greater than 1.
• Big drop in eigenvalue between
component 2 and component 3.
• Components 3-7 are scree.
• Try a 2 component solution.
• Should also look at solution with one fewer
and with one more component.
Less Subjective Methods
• Parallel Analysis and Velcier’s MAP test.
• SAS, SPSS, Matlab scripts available at
https://people.ok.ubc.ca/brioconn/nfactors/
nfactors.html
Parallel Analysis
• How many components account for more
variance than do components derived from
random data?
• Create 1,000 or more sets of random data.
• Each with same number of cases and
variables as your data set.
• For each set, find the eigenvalues.
• For the eigenvalues from the random sets,
find the 95th percentile for each
component.
• Retain as many components for which the
eigenvalue from your data exceeds the
95th percentile from the random data sets.
Random Data Eigenvalues
Root
Prcntyle
1.000000
1.344920
2.000000
1.207526
3.000000
1.118462
4.000000
1.038794
5.000000
.973311
6.000000
.907173
7.000000
.830506
• Our data yielded eigenvalues of 3.313,
2.616, and 0.575.
• Retain two components
Velicer’s MAP Test
• Step by step, extract increasing numbers
of components.
• At each step, determine how much
common variance is left in the residuals.
• Retain all steps up to and including that
producing the smallest residual common
variance.
Velicer's Minimum Average Partial (MAP) Test:
Velicer's Average Squared Correlations
.000000
.266624
1.000000
.440869
2.000000
.129252
3.000000
.170272
4.000000
.331686
5.000000
.486046
6.000000 1.000000
The smallest average squared correlation is
.129252
The number of components is 2
Which Test to Use?
• Parallel analysis tends to overextract.
• MAP tends to underextract.
• If they disagree, increase number of
random sets in the parallel analysis
• And inspect carefully the two smallest
values from the MAP test.
• May need apply the meaningfulness
criterion.
Loadings, Unrotated and Rotated
• loading matrix = factor pattern matrix =
component matrix.
• Each loading is the Pearson r between one
variable and one component.
• Since the components are orthogonal, each
loading is also a β weight from predicting X from
the components.
• Here are the unrotated loadings for our 2
component solution:
Component Matrixa
COLOR
AROMA
REPUTAT
TASTE
COST
ALCOHOL
SIZE
Component
1
2
.760
-.576
.736
-.614
-.735
-.071
.710
-.646
.550
.734
.632
.699
.667
.675
Extraction Method: Principal Component Analysis.
a. 2 components extracted.
• All variables load well on first component,
economy and quality vs. reputation.
• Second component is more interesting,
economy versus quality.
• Rotate these axes so that the two
dimensions pass more nearly through the
two major clusters (COST, SIZE, ALCH
and COLOR, AROMA, TASTE).
• The number of degrees by which I rotate
the axes is the angle PSI. For these data,
rotating the axes -40.63 degrees has the
desired effect.
• Component 1 = Quality versus reputation.
• Component 2 = Economy (or cheap drunk)
versus reputation.
a
Rotate d Com ponent Matrix
TASTE
AROMA
COLOR
SIZE
ALCOHOL
COST
REPUTAT
Component
1
2
.960
-.028
.958
1.E-02
.952
6.E-02
7.E-02
.947
2.E-02
.942
-.061
.916
-.512
-.533
Ex trac tion Met hod: Principal Component A naly sis.
Rotation Method: V arimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
Number of Components in the
Rotated Solution
• Try extracting one fewer component, try one
more component.
• Which produces the more sensible solution?
• Error = difference in obtained structure and true
structure.
• Overextraction (too many components)
produces less error than underextraction.
• If there is only one true factor and no unique
variables, can get “factor splitting.”
• In this case, first unrotated factor  true
factor.
• But rotation splits the factor, producing an
imaginary second factor and corrupting the
first.
• Can avoid this problem by including a
garbage variable that will be removed prior
to the final solution.
Explained Variance
• Square the loadings and then sum them across
variables.
• Get, for each component, the amount of
variance explained.
• Prior to rotation, these are eigenvalues.
• Here are the SSL for our data, after rotation:
Total Variance Explained
Component
1
2
Rotation Sums of Squared
Loadings
% of
Cumulative
Total
Variance
%
3.017
43.101
43.101
2.912
41.595
84.696
Extraction Method: Principal Component Analysis.
• After rotation the two components together
account for (3.02 + 2.91) / 7 = 85% of the
total variance.
• If the last component has a small SSL,
one should consider dropping it.
• If SSL = 1, the component has extracted
one variable’s worth of variance.
• If only one variable loads well on a
component, the component is not well
defined.
• If only two load well, it may be reliable, if
the two variables are highly correlated with
one another but not with other variables.
Naming Components
• For each component, look at how it is
correlated with the variables.
• Try to name the construct represented by
that factor.
• If you cannot, perhaps you should try a
different solution.
• I have named our components “aesthetic
quality” and “cheap drunk.”
Communalities
• For each variable, sum the squared
loadings across components.
• This gives you the R2 for predicting the
variable from the components,
• which is the proportion of the variable’s
variance which has been extracted by the
components.
• Here are the communalities for our beer
data. “Initial” is with all 7 components,
“Extraction” is for our 2 component
solution.
Communalities
COST
SIZE
ALCOHOL
REPUTAT
COLOR
AROMA
TASTE
Initial
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Extraction
.842
.901
.889
.546
.910
.918
.922
Extraction Method: Principal Component Analysis.
Orthogonal Rotations
• Varimax -- minimize the complexity of the
components by making the large loadings
larger and the small loadings smaller
within each component.
• Quartimax -- makes large loadings larger
and small loadings smaller within each
variable.
• Equamax – a compromize between these
two.
Oblique Rotations
• Axes drawn through the two clusters in the
upper right quadrant would not be
perpendicular.
• May better fit the data with axes that are
not perpendicular, but at the cost of having
components that are correlated with one
another.
• More on this later.