What is applied psychometrics? University of Cambridge oudace Department of Psychiatry

What is applied psychometrics?
Tim Croudace [email protected]
Department of Psychiatry
John Rust [email protected]
The Psychometrics Centre
University of Cambridge
What is applied psychometrics?
Professor John Rust
http://www.ppsis.psychometrics.cam.ac.uk
Overview
•
•
•
•
•
About the Centre
What is psychometrics?
Psychometrics today
What we are doing now
What we are going to do
The Psychometric Centre
•
•
•
•
•
•
•
•
•
Educational and diagnostic eg Wechsler
Organisational eg Watson-Glaser, Orpheus
Statistical, IRT and AI techniques
Computer languages eg Mplus, Stata, R
Web based assessment
BPS Level A and B courses
Seminars, workshops and summer schools
PhDs in psychometrics or related areas
Tutorial materials on website
– www.psychometrics.ppsis.cam.ac.uk
4
Current activities
• Who we are (people)
• Announcement about summer schools
• Announcement about forthcoming workshops
What is psychometrics?
• “The science of psychological assessment”
• Much assessment is “high stakes”
•
•
•
•
•
•
•
•
Questionnaires and social surveys
Recruitment and staff development
Licensing and chartering (eg Accountants, Surgeons)
School and University examinations
Psychiatric and ‘special needs’ diagnosis
Credit ratings
Career guidance
Social awareness
Types of assessment
•
•
•
•
•
•
•
First impressions
Application forms and references
Objective tests (on or off line)
Projective tests
Interviews
Essays and examinations
Research questionnaires and semi-structured
interviews
7
The Psychometric Principles
Maximizing the quality of assessment
•
•
•
•
Reliability (freedom from error)
Validity ( ‘... what is says on the tin’)
Standardisation (compared with what?)
Equivalence (is it biased?)
• Rust, J. & Golombok, S. (2009) Modern Psychometrics
• (3rd Edition): Taylor and Francis: London
8
Can everything be measured?
•“If anything exists it must
exist in some quantity and can
therefore be measured”. (Lord
Kelvin 1824, 1907)
•In 1900, Lord Kelvin claimed
"There is nothing new to be
discovered in physics now. All
that remains is more and more
precise measurement."[
9
The theory of true scores
• Whatever precautions have been taken to secure
unity of standard, there will occur a certain divergence
between the verdicts of competent examiners.
• If we tabulate the marks given by the different
examiners they will tend to be disposed after the
fashion of a gendarme’s hat.
• I think it is intelligible to speak of the mean judgment
of competent critics as the true judgment; and
deviations from that mean as errors.
• This central figure which is, or may be supposed to
be, assigned by the greatest number of equally
competent judges, is to be regarded as the true value
..., just as the true weight of a body is determined by
taking the mean of several discrepant measurements.
•Edgeworth, F.Y. (1888). The statistics of examinations.
Journal of the Royal Statistical Society, LI, 599-635.
10
The evolution of the Latent Trait
• Edgeworth, F.Y. (1888). The statistics of examinations.
Journal of the Royal Statistical Society, LI, 599-635. With
two measures of the same characteristic we can estimate
true values.
• Melvin Novik and Frederick Lord (1968) “Statistical theories
of mental test scores” use Classical Test Theory to derive
Latent Trait Theory. Allan Birnbaum, in his supplement,
established Item Response Theory of which Rasch Scaling is
a special case.
• Today Latent Variable Analysis (LVA) is an integral part of
statistical modelling in Psychometrics, Econometrics and
Statistics.
11
What is applied psychometrics?
Tim Croudace [email protected]
Department of Psychiatry
University of Cambridge
psycho·met·rics (sī′kō me′triks) psychometry
Etymologically (from the Greek)
- psychometry means
-
measuring the mind
P. Kline (1979)
“The meaning of psychometrics” p1
-definitions-definitions-definitions• Collins English Dictionary
Psychometrics definition : psychometrics n
1. the branch of psychology concerned with the design
and use of psychological tests
2. application of statistical & mathematical techniques
to psychological testing
• dictionary.reverso.net/englishdefinition/psychometrics
What is psychometrics?
The Science of Psychological Assessment
“the branch of psychology dealing with
measurable factors”
Modern Psychometrics.
by J. Rust & S. Golombok. Routledge. P 4
Even Wikipedia has something to say …
it doesn’t begin too promisingly!!!
[From Wikipedia, the free encyclopedia]
Psychometrics –
Not to be confused with psychrometrics, the measurement of the heat
and water vapor properties of air. For other uses of this term and
similar terms, see (disambiguation).
Psychometry [Redirected from Psychometry (disambiguation)] may refer to:
Psychometry (paranormal) a form of extrasensory perception
Psychometrics a discipline of psychology and education (getting
warmer!!)
And finally it begins to make sense …
– Psychometrics is the field of study concerned with the theory and
technique of educational and psychological measurement, which
includes the measurement of knowledge, abilities, attitudes, and
personality traits. The field is primarily concerned with the
construction and validation of measurement instruments, such as
questionnaires, tests, and personality assessments.
What is ?
[Psychometric] Test Theory
• Psychometric Test Theory
…is essentially a collection of
mathematical concepts that formalize
and clarify certain questions about
constructing and using tests [and scales]
and then provide methods for answering
them
R.P. McDonald (1999) Test Theory: a unified
treatment. LEA. P 9
What is psychometrics?
Item Response Theory (IRT)
Item Response Modelling (IRM)
IRT refers to a set of mathematical models
that describe, in probabilistic terms, the
relationship between a person’s response
to a survey question/test item and his or
her level of the ‘latent variable’ being
measured by the scale
Fayers and Hays p55
– Assessing Quality of Life in Clinical Trials. Oxford Univ Press:
– Chapter on Applying IRT for evaluating questionnaire item and scale
properties.
Psychometric (Measurement)
Theory : 2 main schools, old & new
Classical Test Theory
Item response theory
• Associated with use of
traditional (old)
psychometric methods
• Modern test theory
• A set or family of mathematical /
probability models that describe the
relationship between a person’s
[response / answer] to a
[questionnaire survey / test item]
and his or her level of the latent
variable being measured
– linear factor analysis
– Cronbach’s alpha
(internal consistency),
– summing items and
simple sum scores
Classical Test Theory
Reliability estimation
Reliability
coefficient
Major error
source
Data-gathering
procedure
1. Stability
coefficient
Changes over time Test-retest
Produce-moment
correlation
2. Equivalence
coefficient
Item sampling:
from test form to
test form
Produce-moment
correlation
3. Internal
consistency
coefficient
Item sampling:
A single
test heterogeneity administration
Given form j,
form k
Statistical data
analysis
a) Split-half
correlation/
Spearman Brown
correction,
b) coefficient alpha
c) Factor loadings
d) Other
Table 4.1 p26 Dato M.N. De Gruiter and Leo J. Th. Van der Kamp (2008)
Reliability coefficients
STATA alpha and cialpha commands
Continuous outcomes: Guttman-Cronbach alpha
Test scale = mean(unstandardized items)
Average interitem covariance:
Number of items in the scale:
Scale reliability coefficient:
.0921364
8
0.7942
Cronbach's alpha one-sided confidence interval
--------------------------------------------------------------------Items
|
alpha
[95% Conf.Interval]
---------+----------------------------------------------------------Test
|
.79423639
>=
.7348227
---------------------------------------------------------------------
Exploratory Factor Analysis (ML):
STATA factor command
factor v1-v8, factors(2) ml
Factor analysis/correlation
Method: maximum likelihood
Rotation: (unrotated)
Number of obs
=
87
Retained factors =
2
Number of params =
15
Schwarz's BIC
= 95.9898
Log likelihood = -14.5006
(Akaike's) AIC
= 59.0012
-------------------------------------------------------------------------Factor |
Eigenvalue
Difference
Proportion
Cumulative
-------------+-----------------------------------------------------------Factor1 |
2.84462
1.43839
0.6692
0.6692
Factor2 |
1.40624
.
0.3308
1.0000
-------------------------------------------------------------------------LR test: independent vs. saturated: chi2(28) = 261.31 Prob>chi2 = 0.0000
LR test:
2 factors vs. saturated: chi2(13) =
27.39 Prob>chi2 = 0.0110
Factor loadings (pattern matrix) and unique variances
Variable | Factor1
Factor2 |
Uniqueness
v1 |
0.6652
-0.2760 |
0.4814
v2 |
0.8126
-0.2484 |
0.2780
v3 |
0.7071
-0.3337 |
0.3886
v4 |
0.7123
-0.0119 |
0.4925
v5 |
0.4729
0.4383 |
0.5842
v6 |
0.3554
0.6141 |
0.4966
v7 |
0.3969
0.5332 |
0.5581
v8 |
0.4764
0.5507 |
0.4698
-------------------------------------------------
(2) Exploratory Factor Analysis (ML):
STATA rotate command
. rotate, bentler bl(.35)
Rotated factor loadings (pattern matrix) and unique variances
Variable | Factor1
Factor2 |
Uniqueness
-------------+--------------------+-------------v1 |
0.7188
|
0.4814
v2 |
0.8392
|
0.2780
v3 |
0.7819
|
0.3886
v4 |
0.6452
|
0.4925
v5 |
0.6015 |
0.5842
v6 |
0.7078 |
0.4966
v7 |
0.6533 |
0.5581
v8 |
0.7039 |
0.4698
------------------------------------------------(blanks represent abs(loading)<.35)
Factor rotation matrix
| Factor1 Factor2
-------------+-----------------Factor1 | 0.8985
0.4390
Factor2 | -0.4390
0.8985
--------------------------------
Confirmatory Factor Analysis (ML):
STATA cfa1 command
Log likelihood = -457.31642
|
Coef.
Std. Err.
z
P>|z|
Lambda
|
v1 |
1
.
v2 |
1.146607
.1706831
v3 |
1.077999
.1776428
v4 |
1.128529
.1988093
v5 |
.6362603
.2008189
v6 |
.4119255
.2019811
v7 |
.5417541
.2211306
v8 |
.6653727
.2206966
Var[error]
|
v1 |
.1172731
.0215309
v2 |
.0669433
.0176594
v3 |
.1085488
.0212332
v4 |
.1349088
.0264226
v5 |
.240713
.038299
v6 |
.2753728
.0426118
v7 |
.3244316
.0504165
v8 |
.2991244
.0473675
Var[latent] |
phi1 |
.1107746
.0320436
Goodness of fit test: LR = 109.116
;
Test vs independence: LR = 163.149
;
Number of obs = 87
[95% Conf. Interval]
.
6.72
6.07
5.68
3.17
2.04
2.45
3.01
.
0.000
0.000
0.000
0.002
0.041
0.014
0.003
.
.8120748
.729825
.7388694
.2426624
.0160498
.1083461
.2328152
.
1.48114
1.426172
1.518188
1.029858
.8078011
.975162
1.09793
5.45
3.79
5.11
5.11
6.29
6.46
6.44
6.31
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
.0750732
.0323315
.0669325
.0831214
.1656483
.1918553
.225617
.2062859
.159473
.1015551
.1501651
.1866963
.3157778
.3588903
.4232461
.391963
3.46
0.001
.0479702
Prob[chi2(20) > LR] = 0.0000
Prob[chi2( 8) > LR] = 0.0000
.173579
Single factor model (ML):
STATA confa commands
. confa (f: v1-v8), from(2SLS)
log likelihood = -457.31642
|
Coef.
Std. Err.
Loadings
|
f |
v1 |
1
.
v2 |
1.146608
.1706831
v3 |
1.077998
.1776429
v4 |
1.128529
.1988093
v5 |
.6362603
.2008189
v6 |
.4119255
.2019811
v7 |
.5417541
.2211306
v8 |
.6653728
.2206967
Var[error]
|
v1 |
.1172731
.0215309
v2 |
.0669433
.0176594
v3 |
.1085489
.0212332
v4 |
.1349088
.0264226
v5 |
.2407129
.038299
v6 |
.2753727
.0426117
v7 |
.3244316
.0504165
v8 |
.2991244
.0473675
Goodness of fit test: LR = 109.116
Test vs independence: LR = 163.149
z
P>|z|
Number of obs = 87
[95% Conf. Interval]
.
6.72
6.07
5.68
3.17
2.04
2.45
3.01
.
0.000
0.000
0.000
0.002
0.041
0.014
0.003
.
.8120749
.7298248
.7388694
.2426625
.0160499
.1083461
.2328153
.
1.48114
1.426172
1.518188
1.029858
.8078012
.9751621
1.09793
5.45
3.79
5.11
5.11
6.29
6.46
6.44
6.31
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
.0750732
.0323315
.0669326
.0831214
.1656482
.1918553
.2256171
.2062858
.1594729
.1015551
.1501652
.1866962
.3157776
.3588902
.4232462
.3919629
; Prob[chi2(20) > LR] = 0.0000
; Prob[chi2( 8) > LR] = 0.0000
Confirmatory Factor Analysis (ML):
STATA estat fitindices commands
Fit indices
RMSEA
RMSR
= 0.2276
= 0.0724
90% CI= (0.1868, 0.2703)
TLI
CFI
= 0.7702
= 0.2967
AIC
BIC
=
=
946.633
986.087
Multidimensional factor model (ML):
STATA confa command (2 factors)
confa (f1: v1-v4) (f2: v5-v8), from(2SLS)
log likelihood = -422.79486
Number of obs = 87
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
Means
|
v1 |
1.592161
.051198
31.10
0.000
1.491814
1.692507
v2 |
1.48841
.0494312
30.11
0.000
1.391526
1.585293
v3 |
1.568607
.0522239
30.04
0.000
1.46625
1.670964
v4 |
1.509285
.056323
26.80
0.000
1.398894
1.619677
v5 |
1.582903
.0572911
27.63
0.000
1.470614
1.695191
v6 |
1.511862
.0581486
26.00
0.000
1.397893
1.625831
v7 |
1.500861
.0640531
23.43
0.000
1.37532
1.626403
v8 |
1.456359
.0632607
23.02
0.000
1.332371
1.580348
Loadings
|
v1 |
1
.
.
.
.
.
v2 |
1.129181
.1617634
6.98
0.000
.812131
1.446232
v3 |
1.085591
.1685842
6.44
0.000
.7551719
1.41601
v4 |
1.037635
.1794024
5.78
0.000
.6860131
1.389258
v5 |
1
.
.
.
.
.
v6 |
1.132231
.2299847
4.92
0.000
.6814688
1.582992
v7 |
1.194321
.2745619
4.35
0.000
.6561897
1.732453
v8 |
1.26779
.2739953
4.63
0.000
.7307694
1.804811
Factor cov. |
f1-f1 |
.1190851
.0326402
3.65
0.000
.0551115
.1830586
f2-f2 |
.1128016
.0399112
2.83
0.005
.0345771
.191026
f1-f2 |
.040931
.017838
2.29
0.022
.0059692
.0758928
Goodness of fit test: LR = 40.073
; Prob[chi2(19) > LR] = 0.0032
Test vs independence: LR = 232.192
; Prob[chi2( 9) > LR] = 0.0000
Single factor model (ML):
STATA confa commands
. estat fitindices
Fit indices
RMSEA
RMSR
TLI
CFI
AIC
BIC
=
=
=
=
=
=
0.1136, 90% CI= (0.0637, 0.1627)
0.0299
0.9553
0.8205
879.590
921.510
Reliability coefficients
STATA kr20 command
Kuder-Richardson KR20
Kuder-Richarson coefficient of reliability (KR-20)
Number of items in the scale = 12
Number of complete observations = 6299
Item
Item
Item-rest
Item
| Obs
difficulty variance correlation
---------+-----------------------------------------GHQ1
| 6299
0.1846
0.1505
0.4834
GHQ2
| 6299
0.1640
0.1371
0.3865
GHQ3
| 6299
0.1872
0.1521
0.1954
GHQ4
| 6299
0.1029
0.0923
0.4652
GHQ5
| 6299
0.1691
0.1405
0.4432
GHQ6
| 6299
0.0489
0.0465
0.3846
GHQ7
| 6299
0.1208
0.1062
0.5549
GHQ8
| 6299
0.1103
0.0982
0.5289
GHQ9
| 6299
0.0749
0.0693
0.3143
GHQ10
| 6299
0.0608
0.0571
0.3838
GHQ11
| 6299
0.1218
0.1069
0.4053
GHQ12
| 6299
0.1580
0.1330
0.5043
---------+-----------------------------------------Test |
0.1253
0.4208
KR20 = 0.7760
Reliability coefficients
STATA kr20 command
Computes the reliability coefficient of a set of dichotomous items,
[Cronbach's alpha is used for multipoint scales]
In addition, kr20 computes:
- the item difficulty (proportion of 'right' answers),
- the average value of item difficulty,
- the item variance,
- the corrected item-test point-biserial correlation coefficients,
- the average value of corrected item-test correlation
coefficients.
The items must be coded as:
- '0' for a wrong answer (unexpected answer),
- '1' for a right answer (expected answer).
What is applied psychometrics?
Tim Croudace [email protected]
Department of Psychiatry
John Rust [email protected]
The Psychometrics Centre
University of Cambridge
Message
TRI IRT
Latent Trait Modelling
Note: IRT = IRM = LTM = CDFA*
• Latent trait modelling = factor analysis of categorical
(binary/ordinal/nominal) data
• Unidimensional LTM is widely used to measure
variables/constructs such as
•
•
•
•
•
Personality Dimensions and Intelligence
Ability: Mathematical / Verbal / Spatial
Social and political attitudes
Consumer preferences
Health, Quality of life, Severity of disorder or symptoms
e.g. in depression, back pain, fatigue etc…
• Multidimensional IRT is statistically
developed but is less widely used presently
Here the criterion 1 – 4 are binary but the latent variable (x-axis) is continuous (gaussian norm
From Muthen, B.O (1991). Latent variable epidemiology.
Alcohol Research World. 42 139-167.
8 IRT models you might see …
Rasch model (logistic mixed model)
(1 random effect (individual differences – x – axis))
12 fixed effects – item thresholds (location of s-shapes along x)
[Stata raschtest mixed effects logistic regression [inc gllamm]
Item Discriminations
GHQ1
1.095
GHQ4
1.095
GHQ5
1.095
GHQ6
1.095
GHQ9
1.095
GHQ10
1.095
GHQ11
1.095
GHQ12
1.095
GHQ20
1.095
GHQ26
1.095
Item Difficulties
GHQ1$1
GHQ5$1
GHQ12$1
GHQ11$1
GHQ26$1
GHQ4$1
GHQ20$1
GHQ9$1
GHQ10$1
GHQ6$1
0.021
0.021
0.021
0.021
0.021
0.021
0.021
0.021
0.021
0.021
1.226
1.306
1.364
1.598
1.601
0.028
0.029
0.030
0.033
0.033
1.855
1.986
2.146
2.283
0.039
0.039
0.045
0.048
IRT in the Stata Journal
J-7-3 st0129 .
Est. dichotomous & ordinal item response models with gllamm
By X. Zheng and S. Rabe-Hesketh
Q3/07 SJ 7(3):313—333
describes the one- and two-parameter logit models for dichotomous items the
partial-credit and rating scale models for ordinal items, and an extension of
these models where the latent variable is regressed on explanatory variables
SJ-7-1 st0119
Rasch analysis: Estimation and tests with raschtest
By J. Hardouin
Q1/07 SJ 7(1):22--44
command for estimating the Rasch model, the best known item response
theory model for binary responses
Running Commercial IRT
software from Stata runparscale
runparscale: runparscale brings the IRT
analysis framework of PARSCALE into the Stata
enviroment. While runparscale does little more
than data reformat and ascii file creation, it
removes a lot of the hassle of estimating IRT
models.
Authors: runparscale was written by Laura Gibbons, PhD and Richard Jones,
ScD, under the direction of Paul Crane, MD MPH. We appreciate the assistance
of Tom Koepsell, MD MPH.
Please see runparscale.ado for UW License information.
Laura Gibbons, PhD [email protected]
Richard N Jones, ScD [email protected]
Running Commercial IRT
software from Stata runparscale
Running Commercial IRT
software from Stata runparscale
PARSCALE ITEM PARAMETERS
item
slope (se)
location (se)
-------------------------------------------------1
GHQ1
1.001 (0.091)
-0.252 (0.063)
2
GHQ2
0.433 (0.060)
0.170 (0.124)
3
GHQ3
0.260 (0.056)
1.027 (0.287)
4
GHQ4
0.988 (0.091)
0.323 (0.064)
5
GHQ5
0.934 (0.087)
0.005 (0.065)
6
GHQ6
1.004 (0.100)
0.909 (0.081)
7
GHQ7
1.599 (0.139)
-0.055 (0.044)
8
GHQ8
1.403 (0.122)
0.035 (0.048)
9
GHQ9
0.598 (0.075)
1.286 (0.156)
10
GHQ10
1.035 (0.101)
0.842 (0.077)
11
GHQ11
0.935 (0.088)
0.393 (0.068)
12
GHQ12
1.436 (0.124)
-0.152 (0.048)
--------------------------------------------------
parscale
ITEM FIT STATISTICS
[not to be trusted for short tests, illustrative only]
| BLOCK
| ITEM | CHI-SQUARE | D.F. | PROB. |
----------------------------------------------| GHQ1
| 0001 |
19.56213 |
7. | 0.007 |
| GHQ2
| 0002 |
13.82273 |
9. | 0.128 |
| GHQ3
| 0003 |
5.89128 |
10. | 0.825 |
| GHQ4
| 0004 |
8.73722 |
8. | 0.365 |
| GHQ5
| 0005 |
13.46327 |
8. | 0.096 |
| GHQ6
| 0006 |
12.87186 |
9. | 0.168 |
| GHQ7
| 0007 |
14.25497 |
7. | 0.047 |
| GHQ8
| 0008 |
9.20264 |
7. | 0.238 |
| GHQ9
| 0009 |
27.44038 |
10. | 0.002 |
| GHQ10
| 0010 |
21.55337 |
9. | 0.011 |
| GHQ11
| 0011 |
10.44335 |
8. | 0.235 |
| GHQ12
| 0012 |
20.04176 |
7. | 0.006 |
| TOTAL
|
| 177.28497 |
99. | 0.000 |
.6
.5
.4
.3
-2
-1
0
thetaGHQparscale
1
2
X-axis Latent Trait value (IRT thresholds zero centred)
Y-axis conditional standard error of measurement (s.e.m. varies with score value
under Item Response Theory). Lower s.e.m = greater precision of measurement
Non-parametric IRT Mokken Analysis
STATA loevH command
. loevH GHQ1-GHQ12
Observed
Expected
Number
Easyness
Guttman
Guttman
Loevinger
H0: Hj<=0 of NS
Item
Obs
P(Xj=1)
errors
errors
H coeff
z-stat.
p-value
Hjk
--------------------------------------------------------------------------------------------------GHQ1
548
0.5712
628
1057.50
0.40615
23.2388
0.00000
0
GHQ2
548
0.4708
902
1183.11
0.23760
15.0931
0.00000
0
GHQ3
548
0.3923
954
1140.05
0.16320
10.1904
0.00000
1
GHQ4
548
0.4088
741
1155.62
0.35879
22.5701
0.00000
0
GHQ5
548
0.4982
775
1176.57
0.34131
21.5282
0.00000
0
GHQ6
548
0.2573
538
868.24
0.38036
20.0185
0.00000
1
GHQ7
548
0.5201
675
1151.94
0.41403
25.5869
0.00000
0
GHQ8
548
0.4891
730
1181.99
0.38240
24.2362
0.00000
0
GHQ9
548
0.2500
598
846.50
0.29356
15.1966
0.00000
0
GHQ10
548
0.2701
529
899.44
0.41185
22.1342
0.00000
0
GHQ11
548
0.3923
741
1140.05
0.35003
21.8568
0.00000
0
GHQ12
548
0.5511
629
1100.94
0.42867
25.4203
0.00000
0
--------------------------------------------------------------------------------------------------Scale
548
4220
6450.98
0.34584
50.5208
0.00000
loevH by [email protected] [Websites AnaQol and FreeIRT] allows
verifying the fit of data to the Monotonely Homogeneous Mokken Model or to the Doubly
Monotone Mokken Model. It computes the Loevinger H scalability coefficients, and several
indexes in the field of the Non parametric Item Response Theory.
(1) Non-parametric IRT Mokken Analysis
STATA msp command
. msp GHQ1-GHQ12, c(.4)
The two first items selected in the scale 1 are GHQ7 and GHQ8 (Hjk=0.7357)
The item GHQ6 is selected in the scale 1
Hj=0.5777
H=0.6534
The following items are excluded at this step: GHQ3
The item GHQ12 is selected in the scale 1
Hj=0.5025
H=0.5723
The item GHQ10 is selected in the scale 1
Hj=0.4431
H=0.5267
The item GHQ11 is selected in the scale 1
Hj=0.4538
H=0.5011
The item GHQ1 is selected in the scale 1
Hj=0.4338
H=0.4811
The item GHQ4 is selected in the scale 1
Hj=0.4083
H=0.4616
The item GHQ5 is selected in the scale 1
Hj=0.4095
H=0.4489
None new item can be selected in the scale 1 because all the Hj are lesser than .4 or none new item
has all the related Hjk coefficients significantly greater than 0
Observed
Expected
Number
Easyness
Guttman
Guttman
Loevinger
H0: Hj<=0 of NS
Item
Obs
P(Xj=1)
errors
errors
H coeff
z-stat.
p-value
Hjk
--------------------------------------------------------------------------------------------------GHQ5
548
0.4982
514
870.46
0.40951
22.3093
0.00000
0
GHQ4
548
0.4088
478
828.96
0.42338
22.2905
0.00000
0
GHQ1
548
0.5712
457
795.91
0.42582
21.4001
0.00000
0
GHQ11
548
0.3923
470
812.38
0.42145
21.8744
0.00000
0
GHQ10
548
0.2701
340
631.18
0.46133
20.2369
0.00000
0
GHQ12
548
0.5511
409
827.11
0.50550
26.2866
0.00000
0
GHQ6
548
0.2573
312
606.20
0.48532
20.7341
0.00000
0
GHQ7
548
0.5201
448
859.18
0.47857
25.7520
0.00000
0
GHQ8
548
0.4891
486
870.31
0.44158
24.0575
0.00000
0
--------------------------------------------------------------------------------------------------Scale
548
1957
3550.85
0.44886
48.3819
0.00000
(2) Non-parametric IRT Mokken Analysis
STATA msp command
Scale: 2
---------Significance level: 0.016667
The two first items selected in the scale 2 are GHQ2 and GHQ3 (Hjk=0.4111)
Significance level: 0.012500
None new item can be selected in the scale 2 because all the Hj are lesser than .4 or none new item
has all the related
Hjk coefficients significantly greater than 0
.
Observed
Expected
Number
Easyness
Guttman
Guttman
Loevinger
H0: Hj<=0 of NS
Item
Obs
P(Xj=1)
errors
errors
H coeff
z-stat.
p-value
Hjk
--------------------------------------------------------------------------------------------------GHQ2
548
0.4708
67
113.78
0.41113
8.1914
0.00000
0
GHQ3
548
0.3923
67
113.78
0.41113
8.1914
0.00000
0
--------------------------------------------------------------------------------------------------Scale
548
67
113.78
0.41113
8.1914
0.00000
There is only one item remaining (GHQ9).
(1) Rasch model in STATA
Estimation method: Conditional maximum likelihood (CML)
Number of items: 9
Number of groups: 10 (8 of them are used to compute the statistics of test)
Number of individuals: 548
Number of individuals with missing values: 0 (removed)
Number of individuals with nul or perfect score: 111
Conditional log-likelihood: -1467.1127
Log-likelihood: -2025.3536
Difficulty
Standardized
Items parameters std Err.
R1c
df p-value Outfit Infit
U
----------------------------------------------------------------------------GHQ1
-0.13173
0.15481 11.449
7 0.1202 2.338 1.713 1.799
GHQ4
0.90796
0.15455 11.601
7 0.1145 0.654 0.785 0.863
GHQ5
0.34003
0.15343
4.847
7 0.6787 1.192 1.098 1.658
GHQ6
1.94575
0.16456
8.730
7 0.2727 0.291 0.072 0.368
GHQ7
0.20031
0.15362 10.339
7 0.1702 -1.424 -2.433 -2.124
GHQ8
0.39799
0.15341 13.443
7 0.0620 -0.871 -0.545 -1.673
GHQ10
1.85021
0.16316 11.134
7 0.1329 0.416 0.267 1.077
GHQ11
1.01368
0.15510 13.131
7 0.0690 0.578 0.844 1.462
GHQ12*
0.00000
.
5.045
7 0.6545 -2.916 -2.624 -2.884
----------------------------------------------------------------------------R1c test
R1c= 95.782
56 0.0007
Andersen LR test
Z= 99.418
56 0.0003
----------------------------------------------------------------------------*: The difficulty parameter of this item had been fixed to 0
(2) Rasch model in STATA
raschtest
Ability
Expected
Group Score parameters std Err.
Freq.
Score
ll
-------------------------------------------------------------0
0
-2.449
1.561
82
0.44
-------------------------------------------------------------1
1
-1.202
0.963
61
1.32 -117.4189
-------------------------------------------------------------2
2
-0.524
0.801
55
2.22 -186.8236
-------------------------------------------------------------3
3
0.002
0.734
48
3.12 -189.8916
-------------------------------------------------------------4
4
0.473
0.708
70
4.03 -281.8395
-------------------------------------------------------------5
5
0.933
0.712
54
4.95 -233.6392
-------------------------------------------------------------6
6
1.418
0.744
48
5.87 -171.5103
-------------------------------------------------------------7
7
1.971
0.817
53
6.79 -151.2446
-------------------------------------------------------------8
8
2.685
0.983
48
7.69
-85.0359
-------------------------------------------------------------9
9
3.974
1.591
29
8.57
--------------------------------------------------------------
Running Mplus www.statmodel.com
from Stata runmplus
Runmplus [Author: Richard N Jones, ScD [email protected] ]
Builds an Mplus data file, command file, executes the command file and display Mplus
log file (output) in the Stata results window.
Factor analysis syntax examples:
Exploratory factor analysis with continuous indicators
runmplus y1-y12, type(efa 1 4)
Exploratory factor analysis with categorical indicators
runmplus y1-y12, type(efa 1 4) categorical(all)
Exploratory factor analysis with a mixture of categorical and continuous indicators
runmplus y1-y12,type(efa 1 4) categorical(y1 y3 y5 y7 y9 y11)
Confirmatory factor analysis with continuous indicators
runmplus y1-y6, model(f1 by y1-y3; f2 by y4-y6;)
And finally … think
useR
IR : irtoys package
example plots (from manual)
Author: Ivailo Partchev <[email protected]>
Extract from
//cran.r-project.org/web/views/Psychometrics.html
Classical Test Theory (CTT)
•
•
•
•
•
The CTT package can be used to perform a variety of tasks and analyses associated
with classical test theory: score multiple-choice responses, perform reliability
analyses, conduct item analyses, and transform scores onto different scales.
The CMC package calculates and plots the step-by-step Cronbach-Mesbach curve,
that is a method, based on the Cronbach alpha coefficient of reliability, for
checking the unidimensionality of a measurement scale.
The package psychometric contains functions useful for correlation theory, metaanalysis (validity-generalization), reliability, item analysis, inter-rater reliability, and
classical utility. Cronbach alpha, kappa coefficients, and intra-class correlation
coefficients (ICC) can be found in the psy package.
A number of routines for scale construction and reliability analysis useful for
personality and experimental psychology are contained in the packages psych and
MiscPsycho.
Additional measures for reliability and concordance can be computed with the
concord package.
(2) Extract from
//cran.r-project.org/web/views/Psychometrics.html
Item Response Theory (IRT):
•
•
•
•
•
•
•
•
•
•
•
•
•
The eRm package fits extended Rasch models, i.e. the ordinary Rasch model for dichotomous data (RM), the linear logistic test model (LLTM), the
rating scale model (RSM) and its linear extension (LRSM), the partial credit model (PCM) and its linear extension (LPCM) using conditional ML
estimation. Missing values are allowed.
The package ltm also fits the simple RM. Additionally, functions for estimating Birnbaum's 2- and 3-parameter models based on a marginal ML
approach are implemented as well as the graded response model for polytomous data, and the linear multidimensional logistic model.
Item and ability parameters can be calibrated using the package plink. It provides unidimensional and multidimensional methods such as
Mean/Mean, Mean/Sigma, Haebara, and Stocking-Lord methods for dichotomous (1PL, 2PL and 3PL) and/or polytomous (graded response, partial
credit/generalized partial credit, nominal, and multiple-choice model) items. The multidimensional methods include the Reckase-Martineau method
and extensions of the Haebara and Stocking-Lord method.
The difR package contains several traditional methods to detect DIF in dichotomously scored items. Both uniform and non-uniform DIF effects can be
detected, with methods relying upon item response models or not. Some methods deal with more than one focal group.
The package lordif provides a logistic regression framework for detecting various types of differential item functioning (DIF).
The package plRasch computes maximum likelihood estimates and pseudo-likelihood estimates of parameters of Rasch models for polytomous (or
dichotomous) items and multiple (or single) latent traits. Robust standard errors for the pseudo-likelihood estimates are also computed.
A multilevel Rasch model can be estimated using the package lme4 with functions for mixed-effects models with crossed or partially crossed random
effects.
Other packages of interest are: mokken to compute non-parametric item analysis, the RaschSampler allowing for the construction of exact Rasch
model tests by generating random zero-one matrices with given marginals, mprobit fitting the multivariate binary probit model, and irtoys providing
a simple interface to the estimation and plotting of IRT models. Simple Rasch computations such a simulating data and joint maximum likelihood are
included in the MiscPsycho package.
The irtProb is designed to estimate multidimensional subject parameters (MLE and MAP) such as personnal pseudo-guessing, personal fluctuation,
personal inattention. These supplemental parameters can be used to assess person fit, to identify misfit type, to generate misfitting response
patterns, or to make correction while estimating the proficiency level considering potential misfit at the same time.
Gaussian ordination, related to logistic IRT and also approximated as maximum likelihood estimation through canonical correspondence analysis is
implemented in various forms in the package VGAM.
Two additional IRT packages (for Microsoft Windows only) are available and documented on the JSS site. The package mlirt computes multilevel IRT
models, and cirt uses a joint hierarchically built up likelihood for estimating a two-parameter normal ogive model for responses and a log-normal
model for response times.
Bayesian approaches for estimating item and person parameters by means of Gibbs-Sampling are included in MCMCpack. In addition, the pscl
package allows for Bayesian IRT and roll call analysis.
The latdiag package produces commands to drive the dot program from graphviz to produce a graph useful in deciding whether a set of binary items
might have a latent scale with non-crossing ICCs.
(3) Extract from
//cran.r-project.org/web/views/Psychometrics.html
Structural Equation Models, Factor Analysis, PCA:
•
•
•
•
•
•
•
•
•
•
Ordinary factor analysis (FA) and principal component analysis (PCA) are in the package stats as functions factanal() and princomp().
Additional rotation methods for FA based on gradient projection algorithms can be found in the package GPArotation. The package nFactors
produces a non-graphical solution to the Cattell scree test. Some graphical PCA representations can be found in the psy package.
The sem package fits general (i.e., latent-variable) SEMs by FIML, and structural equations in observed-variable models by 2SLS. Categorical
variables in SEMs can be accommodated via the polycor package. The systemfit package implements a wider variety of estimators for
observed-variables models, including nonlinear simultaneous-equations models. See also the pls package, for partial least-squares estimation,
the gR task view for graphical models and the SocialSciences task view for other related packages.
The package lavaan can be used to estimate a large variety of multivariate statistical models, including path analysis, confirmatory factor
analysis, structural equation modeling and growth curve models. It includes the lavaan model syntax which allows users to express their
models in a compact way and allows for ML, GLS, WLS, robust ML using Satorra-Bentler corrections, and FIML for data with missing values. It
fully supports for meanstructures and multiple groups and reports standardized solutions, fit measures, modification indices and more as
output.
SEMModComp conducts tests of difference in fit for mean and covariance structure models as in structural equation modeling (SEM)
The package FAiR performs factor analysis based on a genetic algorithm for optimization. This makes it possible to impose a wide range of
restrictions on the factor analysis model, whether using exploratory factor analysis, confirmatory factor analysis, or a new estimator called
semi-exploratory factor analysis (SEFA).
FA and PCA with supplementary individuals and supplementary quantitative/qualitative variables can be performed using the FactoMineR
package whereas MCMCpack has some options for sampling from the posterior for ordinal and mixed factor models.
The homals package provides nonlinear PCA and, by defining sets, nonlinear canonical correlation analysis (models of the Gifi-family).
Independent component analysis (ICA) can be computed using fastICA. Independent factor analysis (IFA) with independent non-Gaussian
factors can be performed with the ifa package.
A desired number of robust principal components can be computed with the pcaPP package.
The package psych includes functions such as fa.parallel() and VSS() for estimating the appropriate number of factors/components as well as
ICLUST() for item clustering.
Psychometrics in R
• Special volume of the Journal of Statistical
Software
– www.jstatsoft.org
• Volume 20
–
–
–
–
–
–
–
–
Multilevel Rasch
Correspondence Analysis
Rasch
Multilevel IRT
Multidimensional Rasch
Extended Rasch
Marginal Maximum Likelihood IRT
Mokken scale analysis …
Free R software
• The program LTM is available for R from
– http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm.
– It is available as an R version and S-Plus version.
– ltm fits the logit-probit (normal latent trait; logistic link function)
models with one- [and two] factors.
– In a very recent (but complex) development it also allows for inclusion
of nonlinear terms (e.g., interaction and quadratic terms).
• Extra features:
– computation of factor scores using Multiple Imputation
– Rasch model
• for which Goodness of Fit is assessed using a parametric Bootstrap version
of the Pearson chi-squared.
Free software
• Factor/M-IRT
– Factor
• Urbano LorenzoSeva & Pere J.
Ferrando
• http://psico.fcep.urv.es/u
tilitats/factor/
• MIRT
– NOHARM
FACTOR
//psico.fcep.urv.es/utilitats/factor/
Factor is a program developed to fit the Exploratory Factor Analysis model.
Below we describe the methods used.
Univariate and multivariate descriptives of variables:
Univariate mean, variance, skewness, and kurtosis
Multivariate skewness and kurtosis (Mardia, 1970)
Var charts for ordinal variables
Dispersion matrices:
User defined tipo matrix
Covariance matrix
Pearson correlation matrix
Polychoric correlation matrix with optional Ridge estimates
Procedures for determining the number of factors/components to be
retained:
MAP: Minimum Average Partial Test (Velicer, 1976)
PA: Parallel Analysis (Horn, 1965)
PA - MBS. It is an extension of Parallel Analysis that generates random
correlation matrices using marginally bootstrapped samples (Lattin,
Carroll, & Green, 2003)
Factor and component analysis:
PCA: Principal Component Analysis
ULS: Unweighted Least Squares factor analysis (also MINRES and PAF)
EML: Exploratory Maximum Likelihood factor analysis
MRFA: Minimum Rank Factor Analysis (ten Berge, & Kiers, 1991)
Schmid-Leiman second-order solution (1957)
Factor scores (ten Berge, Krijnen, Wansbeek, & Shapiro, 1999)
In ULS factor analysis, the Heywood case correction described in Mulaik
(1972, page 153) is included: when an update has sum of squares
larger than the observed variance of the variable, that row is
updated by constrained regression using the procedure proposed
by ten Berge and Nevels (1977).
Some of the rotation methods to obtain simplicity are:
Quartimax (Neuhaus & Wrigley, 1954)
Varimax (Kaiser, 1958)
Weighted Varimax (Cureton & Mulaik, 1975)
Orthomin (Bentler, 1977)
Direct Oblimin (Clarkson & Jennrich, 1988)
Weighted Oblimin (Lorenzo-Seva, 2000)
Promax (Hendrickson & White, 1964)
Promaj (Trendafilov, 1994)
Promin (Lorenzo-Seva, 1999)
Simplimax (Kiers, 1994)
Some of the indices used in the analysis are:
Test on the dispersion matrix: Determinant, Bartlett's test and KaiserMeyer-Olkin (KMO)
Goodness of fit statistics: Chi-Square Non-Normed Fit Index (NNFI; Tucker
& Lewis); Comparative Fit Index (CFI); Goodness of Fit Index (GFI);
Adjusted Goodness of Fit Index (AGFI); Root Mean Square Error of
Approximation (RMSEA); and Estimated Non-Centrality Parameter
(NCP)
Reliabilities of rotated components (ten Berge & Hofstee, 1999)
Simplicity indices: Bentler’s Simplicity index (1977) and Loading Simplicity
index (Lorenzo-Seva, 2003)
Mean, variance and histogram of fitted and standardized residuals.
Automatic detection of large standardized residuals.
Interesting Journals …
•
•
•
•
•
•
•
•
Psychological Assessment
Psychological Methods
Multivariate Behavioural Research
Applied Psychological Measurement
Journal of Educational and Behavioural Statistics
Structural Equation Modeling
Psychometrika
Educational and Psychological Measurement
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Running Mplus www.statmodel.com
from Stata runmplus
Excellent book chapter
(non-technical)
Application oriented book
• see Chapter by
Assessing Quality of Life
in Clinical Trials; Methods and Practice
Edition: 2nd
Author(s): Peter Fayers; Ron Hays
ISBN: 0198527691
– Reeve and Fayers
• Applying item response
theory modelling for
evaluating questionnaire
item and scale properties
download for free from
www.oup.co.uk/pdf/0-19-852769-1.pdf
££££££££££££££££££££££
• And out there in commerce, money talks…
• As Test-Taking Grows, Test-Makers
Grow Rarer, May 5, 2006, NY Times.
Psychometrics, one of the most
obscure, esoteric and cerebral
professions in America ….
is now also one of the hottest