1 Predicting True Patterns of Cognitive Performance from

1
Predicting True Patterns of Cognitive Performance
from Noisy Data
W. Todd Maddox
W. K. Estes
University of Texas, Austin
Indiana University
In press at Psychonomic Bulletin and Review
Abstract
Starting from the premise that the purpose of cognitive modeling is to gain
information about cognitive processes of individuals, we develop a general theoretical
framework for assessment of models on the basis of tests of the models' ability to yield
information about true performance patterns of individual subjects and the processes
underlying them. To address the central problem that observed performance is a
composite of true performance and error, we present formal derivations concerning
inference from noisy data to true performance. Analyses of model fits to simulated data
illustrate the usefulness of our approach for coping with difficult issues of model
identifiability and testablity.
Introduction
In nearly all cognitive research, analyses of data are conducted by
application of statistical methods deriving from the family of linear statistical
models (analysis of variance and covariance, multivariate analysis). The methods
all depend on the assumption that any performance score constitutes a mixture
of two components: One represents true effects of experimental variables
(henceforth, the "true score") and one represents experimental error. Our
premise in this article is that the same assumption holds when scientific models
are applied to the same data. The importance of the assumption stems from our
view that the purpose of cognitive modeling is not simply to describe data but to
gain information about the cognitive processes of individuals that underlie
performance.
In the following sections, we first formalize the problem of inferring true
performance patterns from fits of models to composite data, and derive some
general relationships among observed performance scores, true scores, and
predicted scores generated by model fits. Second, using simulated data for which
the true scores and error components are known, we address the problem of
2
assessing a model's capability for recovering true parameter values from model
fits and generating useful inferences about patterns of true performance.
A Framework for Model-Based Inference of True Performance
Basic Terms and Functions
The presentation and illustration of the methods we propose refer to any
experiment to which we might apply a cognitive model in order to aid inferences about
the output of the experimental subjects' cognitive processes.
The theoretical framework for our methodology follows Estes (2002). Four
aspects of cognitive performance are distinguished: observed performance,
performance predicted by a model, true performance, and error. We denote these by P,
Pr, T, and e, respectively. Each term refers to the performance of an individual subject
on a single experimental trial.
The term T represents the output of the individual's cognitive processes before
the output enters into generation of an observed response, P. The true response, T, is
assumed to have the form of some function of a set (or vector) of parameters, θ, that is,
T = f(θ).
(1)
The function may be explicitly defined, or it may be implicit in a computer program that
generates predictions of performance in the situation. Observed performance is
represented by
P = T + e.
(2)
The error term e is assumed to be normally distributed with a mean of zero and a
standard deviation σ that is unknown a priori but can be estimated from data. Our
specific assumption about the form of the error term is not the only possibility, but it has
the advantage of conforming to long-term usage in statistics and psychometrics.
An alternative expression for observed performance is
P = f(θ). + e,
(3)
obtained by substitution from Equation 1.
For models considered in this article, values of the parameters are not known a
priori, but upon application of the model to an experiment, the parameter values are
estimated from performance data. The estimated parameter vector is denoted θe, and
Pr, performance predicted by the model, is given by
Pr = f(θe),
(4)
where f(θe) is the right hand side of Equation 1 with θ replaced by its estimate.
With this conceptual machinery in hand, we are prepared to approach the
question of how outputs of model fits can yield information about properties of true
3
performance both at a quite general level and also as it arises in analyses of specific
models and experiments.
Derivation of General Relationships between Predicted and True Performance
To illustrate our approach at the more general level, we pose the question of
whether and how constant error, or bias, may arise if the disparities between observed
and predicted performance in the output of a model fit are used to infer corresponding
disparities between predicted and true performance.
We consider first the performance of a subject in a single episode under some
condition of an experiment, using the notation defined above but adding a subscript j (j =
1, 2, ... J) to P and the other terms in order to identify the condition referred to among a
set of J possibilities. Assuming that a model has been fitted to the subject's data, we
seek a relation between Pj - Prj and Tj - Prj. By appropriate application of Equations
1-4, we obtain the identity
Pj - Prj = Tj - Prj + ej.
(5)
For any single episode, the observed difference, Pj - Prj, predicts too high or too low a
value for the true difference, Tj - Prj, depending on the sign of ej. If the episode could
be replicated a large number of times with no change in values of the model's
parameters, mean error would tend to zero and Pj - Prj would be a perfect predictor of
Tj - Prj.
In cognitive research, unlimited replication with constant parameters may not be
possible, but cumulation of Pj - Prj over the conditions of an experiment can inform us
about the corresponding cumulation for Tj - Prj. Assuming that ej is sampled from the
same normal distribution on all trials and averaging the two sides of Equation 5 over
conditions, we obtain
(1/J)Σ( Pj - Prj) = (1/J)Σ(Tj - Prj + ej),
or
M(P - Pr) = M(T - Pr) + M(e).
(6)
If the means are computed for successively larger numbers of conditions (J = 1, 2, ...),
the sign of M(e) may fluctuate, but beyond some J, Jc, it becomes highly unlikely that
further fluctuations will be observed and nearly always the observed difference for J >
Jc, will consistently imply too large or too small a value for the true difference1. The
magnitude of this disparity will progressively decrease, of course, becoming vanishingly
small for sufficently large J.
{Figure 1 about here}
For a concrete illustration, Figure 1 presents plots of Equation 6 for
simulated data (curves of forgetting) of two hypothetical subjects whose true
performance is described by the power function Tn = n-.3, with n, corresponding
to j in the preceding derivation, denoting number of unit time intervals.
4
Performance scores were obtained by adding to Tn values of e sampled from
error distributions with σ equal to .02 or .04. Although an n of 100 is not enough
for the functions to reach asymptote, stable relationships between Pj - Prj and Tj
- Prj appear to have emerged by a sequence length of about 60 for the lower
noise and 80 for the higher noise subject.
To treat mean squared disparities, a derivation analogous to that of Equation 6
yields
(1/J)Σ(Pj - Prj)2 = M(T - Pr)2 + 2M(T - Pr)M(e) + M(e2).
(7)
For any value of Pj - Prj, the cross product on the right side of Equation 7 can be either
positive or negative, depending on the sign of M(e), but as the number of conditions, J,
increases, M(e) tends toward zero and and Equation 7 is increasingly well approximated
by
M[(P - Pr)2] = M[(T - Pr)2] + M(e2).
(8)
On the average, M(e2) is constant over J, so except for small values of J, we have
approximately a linear relation between M[(P - Pr)2] and M[(T - Pr)2].
Demonstrations of how these general results can translate into informative
analyses in a specific research situation are given in the next section.
Analyses of a Specific Model in a Research Context
Owing to the ready availability of relevant data, a convenient choice of a
cognitive model for analysis is the array model of recognition memory (Estes, 2002;
Estes & Maddox, 1995; Estes & Maddox, 2002), described in the Appendix. Our
procedure constitutes the following steps: (1) simulate an experiment in a computer
program; (2) generate artificial data representing the model's predictions of performance
in the experiment by a group of hypothetical subjects; (3) recode the artificial
performance scores in accord with our assumptions about their constituents as
expressed in Equation 2; (4) by application of a standard model-fitting program,
estimate the values of the model's parameters needed to predict the recoded data ; (5)
analyze the observed and predicted performance scores generated in the simulation
exactly as we would do for the corresponding real experiment.
Constructing a simulated data set. Our simulation data are based on a fit of the
array model to performance data of 48 subjects in an experiment on study-test
recognition that we had conducted previously. The experimental conditions are factorial
combinations of shorter or longer study lists (4 vs. 8 items), weak or strong items (1 vs.
2 occurrences in the study list), and shorter or longer study time. The experiment,
henceforth termed the List-Strength experiment is representative of a class that has had
intensive use in tests of models of recognition memory (Ratcliff, Clark, & Shiffrin, 1990;
Shiffrin, Ratcliff, & Clark, 1990).
Using the parameter values estimated in the model fit2, a computer program
embodying the model generated sets of performance scores (subjects' estimates of the
5
probability that a test item is old) for 48 hypothetical subjects in two replications of the
experiment, henceforth denoted R1 and R2, with the same subjects but different
random samples of stimuli from the same item category (nonword letter strings).
However, these scores could not be used as they stand to constitute the
"observed" data for the simulations because they were error-free "true" scores as
defined in Equation 1. Thus, we added to the score for each individual on each trial an
error variable, the quantity ei in Equation 2, assumed to come from a population with a
mean of 0 and variance σ2. To obtain data at two levels of error, we drew values of ei
from this population with σ2 set equal to .01 for one simulation and to .02 for a second
simulation in each replication.
Analyzing Individual subject data. To bring out relevant properties of the
model fits, the first step is to examine for each subject and condition the difference
between the true value of the performance measure Pij and the value predicted by the
model, as illustrated for Subject #1 of the simulation in Table 1. The Conditions in Table
1 are the 8 combinations of list length, (LL4 or LL8), study duration (D1 and D2, for
short and long respectively), and study frequency (w and s for 1 and 2, respectively) for
old test items together with the associated new test items. The first column in Table 1,
headed True, gives the true Old ratings (Tj) of test items
{Table 1 about here}
for this subject in each condition; the second column gives the corresponding values
(Prj) predicted by the model at error variance level .01. The most conspicuous feature of
this tabulation is the accuracy with which the predicted scores for the standard model
track the true scores.
With similar tabulations accomplished for all 48 simulated subjects at both error
variance levels, the next step is to collate the results in a way that does not lose track of
the individuals. It is not feasible to do this for individual conditions, so we use T and Pr
means over the 12 conditions. These could be compared in frequency histograms, but
the necessary use of class intervals would obscure the results for individual subjects.
Thus, we use scatter plots of these means (for replication R1) as shown in Figure 2.
The upper panel of this figure, where the predicted scores have error
{ Figure 2 about here}
variance .01, shows no suggestion of any constant error (because all plotted points fall
on or near the main diagonal). For the lower panel, where the predicted scores have
error variance .02, the picture is similar except that there is evidence of some constant
error, the predicted values tending to fall slightly below the true values.
These results are compatible with expectation based on Equation 5 (with both
sides averaged over subjects). The sign of any constant error could not be predicted in
advance, but the magnitude should be greater in the variance .02 plot, as observed.
In the means for true-predicted performance, large positive errors of prediction
can be balanced by large negative errors, but the same is not true for a measure of
dispersion. Thus, we computed for each subject the sample standard deviation (SD) of
T - Pr over the 12 conditions, essentially the equivalent of MSE but defined on a more
familiar scale. Because the SDs are unbounded, scatterplots are not as informative as
6
for means, and we use instead plots of individual-subject SDs versus their rank order.
These are shown in Figure 3. For all but two of the 48 subjects, the plotted points fall
below about .04 SD, indicating goodness of agreement of Pr with T as good or nearly as
good as that shown for subject # 1 in Table 1.
{Figure 3 about here}
Because of its common usage, we include an auxiliary analysus of the correlation
between T and Pr. Values of Pearson r are shown as rank-order plots in Figure 4. The r
values are nearly all quite large and all are significant at the .001 level.
{Figure 4 about here}
Recovering True Values of a Model's Parameters
The next question to be addressed in this analysis is how well values of the true
parameters of a model are recovered when the model is fitted to simulated data that
contain error.
{ Table 2 about here}
For the fit of the array model to the simulation data, results in terms of mean
parameter values are summarized in Table 2. Rows of Table 2 correspond to the four
parameters of the array model, defined as in the Appendix. Columns give the true mean
values of the parameters and the estimates of them from the fits of the model to the
variance .01 and variance .02 simulation data. The estimates are not close to the true
means in absolute value, but in nearly every row the estimates move toward the true
value as error variance decreases. For both replications, the patterns of relative
magnitudes of the estimates at both variance levels mirror the patterns of true values.
To assess recovery of parameter values for individual subjects, we computed for
each parameter the average absolute differences between true values and estimates.
Illustrative results are given in Table 3 for the R1 simulation with variance .01. The large
ranges for the similarity parameters are due to a few outliers, and the median
differences signify close agreement between true values and estimates for individual
subjects. In contrast, the same is not true for the storage-probability parameters.
Generalizability
{ Table 3 about here}
For a final illustration of our approach, we consider an important, though
limited form of generalizability of a model termed "validation" by Busemeyer &
Wang, 2000: The parameters of a model are estimated from data under one set
of experimental conditions and used to predict quantitative aspects of data
obtained under a different set of conditions. Within each of the two replications of
the List-Strength Experiment, each subject yielded data under 12 conditions,
which were divided into two sets of 6 for this analysis3. The parameters of the
array model were estimated from fits to the simulation data (replications R1 and
R2, both with error variance .01) for sets A and B separately, and the estimates
7
from the fit of Set B were used to predict performance in Set A. The new feature
in our use of the validation procedure is that, whereas Busemeyer & Wang
(2000) were concerned with the correspondence of the predicted scores for Set
A with Set A performance, we focus on the goodness with which the Set A
predictions correspond to Set A true scores.
Table 4 shows mean true and predicted scores for Set A. Predictions of true
scores for Set A based on Set A parameters are extremely close, as anticipated,
providing a base line for the validation test. Of more interest, the pattern of Set A true
values is well predicted by generalization from the model fits to Set B.
{Table 4 about here}
Discussion
In this Discussion, we first consider in detail some issues that arose during
inspection of the results of applying our methods to simulated data, and, second,
address the issue of applicability of our findings to real research situations.
Accomplishments and Unfinished Business
Cognitive models have traditionally been selected and evaluated mainly
for capability to describe observed performance data. The main purpose of this
article is to point up the importance of taking explicit account also of models'
capabilities for extracting from data information about true performance and the
processes underlying it.
The first step in this effort was to present a provisional theoretical
framework for model-based inference from data to true performance. We say
"provisional," because, although the approach outlined seems promising, much
work remains to be done.
For reasons of tractability, we have assumed that the error component of
performance is localized in the translation of the output of cognitive processes
into observable performance. At present, we are not in a position to identify and
estimate other sources of error, for example, in the encoding stage of cognitive
processing, but the problem deserves attention in future work.
At a more technical level, our assumptions about the form of the error
distribution, adapted from statistical and psychometric models, may need
modification for some lines of application. In our theoretical expressions for
performance measures, starting with Equation 2, we assumed that error
components are sampled from the same normal distribution under all conditions
of an experiment, the distribution having a mean of zero and an unknown but
Because the confidence rating score used in our
constant variance σ2.
illustrative applications is bounded between 0 and 1, we may in future work need
to consider truncating the error distribution in order to keep scores for simulated
subjects within bounds. Also, for some applications it might be desirable to allow
the variance of the error distribution to vary over subjects or conditions. And if in
the research situation being simulated, subjects made discrete rather than
quantitatively graded responses, as by reporting judgments of "old" or "new" for
8
old or new test items in a recognition experiment, the assumption of normally
distributed error would be inappropriate, and it might be preferable to assume a
binomial distribution. All of the kinds of analyses we have reported could be
carried out with any of these alternative assumptions.
Analyses of the array model as applied to the list-strength experiment yielded
instructive insights that perhaps could not have been achieved in any other way. Some
of these involve recovery of true parameter values. Parameter recovery is an essential
aspect of the assessment of a model because of its bearing on issues of identifiability,
that is, the degree to which its parameters are individually and separately estimable
from data (Bamber & van Santen, 2000). In cognitive research, complete identifiability
has been achieved only for simple Markov-chain models (Wickens, 1982). In our work
with the array model, we have found that a 5-parameter version originally applied to the
list-length experiment is equivalent to the 4-parameter version described in the
Appendix of this article (Estes, 2002). Whether we can go on to produce a fully
identifiable array model by a reparameterization, we do not know. However, even partial
identifiability is basic to generalizability of a model and to inferences about underlying
processes (Bamber & van Santen, 2000; Li, Lewandowsky, & DeBrunner, 1996; Pitt,
Kim, & Myung, 2003).
For the array model, our relevant findings, summarized in Table 3, present a
mixed picture. Parameter recovery for individual subjects was reasonably good for
similarity parameters but much less so for storage-probability parameters. However,
patterns of relative values of estimated parameters mirrored the true patterns
reasonably well for a majority of subjects.
To interpret these results, we note that it is the estimate of the parameter vector,
e
θ , obtained from a model fit that mediates predictions of performance. Evidently, in the
running of the computer search program that seeks optimal parameter estimates,
values of different parameters trade off so that too high an estimate of one may be
compensated by too low an estimate of another, leaving almost unaltered the estimate
of θe, and therefore the model's predictions. Thus, properties of the configuration of
parameter values may be a stable characteristic of a subject even when individual
parameter values are not.
We suspect that alternative cognitive models we might have chosen for
illustrations, for example, GCM (Nosofsky, 1986) REM (Shiffrin & Steyvers, 1997); SAM
(Raaijmakers & Shiffrin, 1981); Todam (Murdock, 1982), contain nonlinearities of the
same kind that make the array model less than fully identifiable in its present form. A
practical import of our analyses is that caution is advisable in using extant cognitive
models as bases for testing hypotheses about specific underlying processes even
though even a less than fully identifiable model may be found capable of correctly
predicting true response patterns of individuals when evaluated by simulation results.
In this article, we have used the array model solely as a vehicle for illustrating a
proposed methodology. We assume, however, that further applications of the same
methodology can take us toward a full assessment. Success of the version of the model
described here (the "standard model") at predicting true performance should be
compared with that of alternative versions in which various assumptions are modified,
contributing evidence regarding necessity as well as sufficiency of the assumptions
(Estes, 2002). Among other items needing attention in this effort are possible
reparameterizations of the model that might yield full identifiability.
9
The Case for Applicability
The philosophy behind extrapolation of conclusions from our simulation results to
real research situations is the same as that pertaining to any kind of simulation testing,
from aeronautics to pharmacology. A method that cannot be tested directly in the real
situation is given a chance to fail in a simulation (in aeronautics, a wind tunnel; in
pharmacology, test of a medication on animal models of human physiology), and if the
method does fail, it is discarded or sent back to the bench for modification. A method
that passes the simulation test is judged tenable and passed on to more stringent
testing or to practical application.
No guarantees go with this approach, and its long-term viability in many domains
owes to continuing efforts to improve the quality of simulations and to deepen
understanding of relationships between simulation and situation simulated.
References
Bamber, D., & van Santen, J. P. H. (2000). How to assess a model's testability and
identifiability. Journal of Mathematical Psychology, 44, 20-40.
Busemeyer, J. R., & Wang, Y-M. (2000). Model comparisons and model selections
based on generalization criterion methodology. Journal of Mathematical Psychology, 44,
171-189.
Estes, W. K. Classification and Cognition. (1994). New York: Oxford Univ. Press.
Estes, W. K. (2002). Traps in the route to models of memory and decision.
Psychonomic Bulletin & Review, 9, 3-25.
Estes, W. K., & Maddox, W. T. (1995). Interactions of similarity, base rate, and feedback
in recognition. Journal of Experimental Psychology: Learning, Memory and Cognition,
21, 1075-1095.
Estes, W. K. & Maddox, W. T. (2002). On the processes underlying stimulus-familiarity
effects in recognition of words and non-words. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 28, 1003-1018.
Feller, W. (1957). An Introduction to Probability Theory and its Applications. New York:
Wiley.
Li, S.-C., Lewandowsky, S., & DeBrunner, V. E, (1996). Using parameter sensitivity and
interdependence to predict model scope and falsifiability. Journal of Experimental
Psychology, 125, 360-369.
Murdock, B. B. Jr. (1982). A theory for the storage and retrieval of item and
associative information. Psychological Review, 89, 609-626.
10
Nosofsky, R. M. (1986). Attention, Similarity, and the identification-categorization
relationship. Journal of Experimental Psychology: General, 115, 39-57.
Pitt, M. A., Kim, W., & Myung, I. J. (2003). Flexibility versus generalizability in model
selection. Psychonomic Bulletin & Review, 10,
29-44.
Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory.
Psychological Review, 88, 93-134.
Ratcliff, R., Clark, S. E., & Shiffrin, R. M. (1990). List-strength effect: I. Data and
discussion. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 16, 163-178.
Shiffrin, R. M., Ratcliff, R., & Clark, S. E. (1990). List-strength effect: II.
Theoretical mechanisms. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 16, 179-195 .
Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REMretrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145-166.
Wickens, T. D. (1982). Models for Behavior. San Francisco, Freeman.
Appendix
Summary of the Array Model
The array model assumes instance-based memory storage as in
categorization models from which it is derived (Estes, 1994). In application to the
present experiments, it is assumed that on each study trial, a representation of the item
presented is stored in a memory array with probability α(k-1), where α is a constant with
a value in the range 0 to 1 and k(=1,2,...) indexes the number of occurrences of the
item. The storage parameter is denoted in the text as a1 or a2, according as study
duration is short or long, a1 and a2 being equal to α and α2, respectively.
On each test trial of a recognition experiment, the similarity of the test item to
each element of the memory array is computed and the similarities are summed to
obtain a total similarity that we denote SimT(Old), where T indexes the type of test trial
(and is replaced by o if the test item is old, that is, came from the study list, and by n if
the test item is new. The probability that the learner judges an old test item to be "old"
is given by the expression
11
Po(Old) = Simo(Old)/[Simo(Old) + B]
and the corresponding probability for a new item by
Pn(Old) = Simn(Old)/[Simn(Old) + B],
where B is a constant whose value reflects the learner's criterion, or bias, for making a
"new" judgment about an item regardless of its status in memory4.
Letting k denote the number of repetitions of an item and N the number of items
in the study list, and setting α equal to 1 for simplicity,these expressions become
Po(Old) = [k+(N-k)s]/[k+(N-k)s+B],
and
Pn(Old) = Ns/(Ns+B).
For its use in this article, the model was modified by definition of two similarity
parameters: One of these, denoted sm, applies when the study frequency of a test item
matches that of the memory element it is being compared to, and a second parameter,
snm, applies when there is no match. The equations given above can readily be revised
to accommodate this modification.
Acknowledgements
The research reported was supported by NSF Grants SBR-996206 and BCS0130512 and NIH Grant MH 59196. Reprint requests or correspondence about this
article should be addressed to W. T. Maddox, Department of Psychology, Institute for
Neuroscience, University of Texas, 1 University Station A8000, Austin, TX, 78712-0187,
or to W. K. Estes, Department of Psychology, Indiana University, Bloomington, IN,
47405-7007. E-mail communications may be sent to [email protected] or to
[email protected].
Footnotes
1. This implication is based on the general theory of recurrent random variables (Feller,
1957).
2. This procedure was used in order to ensure that both the values of individual
parameters and the combinations of values would be representative of those
characterizing real subjects in the domain of application. The issue of
12
representativeness and our way of achieving it are discussed in a companion article,
Estes and Maddox, Coping with individual differences and error in applications of
cognitive models (submitted).
3. In the notation used in Table 2, Set A is 4wD1,4sD2, 4wnew, 8wD2, 8sD1, 8snew;
and set B is 4wD2, 4sD1, 4snew, 8wD1, 8sD2, 8wnew. This balancing was done in
order to minimize any variation in the error component of performance between sets A
and B.
34 It can be shown that B is a function of the other four parameters (Estes, 2002).
13
Table 1
Comparison of True and Predicted Performance Scores
for an Individual Subject
Score Type
Condition
True
Predicted
LL4 w D1
LL4 w D2
0.74
0.85
0.72
0.85
LL4 w new
LL8 w D1
LL8 w D2
LL8 w new
0.10
0.74
0.85
0.18
0.09
0.72
0.85
0.17
LL4 s D1
LL4 s D2
LL4 s new
0.89
0.96
0.12
0.88
0.96
0.11
LL8 s D1
LL8 s D2
LL8 s new
0.90
0.96
0.22
0.89
0.96
0.20
0.63
0.62
M
14
Table 2
Mean Parameter Estimates for Simulation Data
by Replications and Variance Levels
Replication Parameter
True Value
Estimated Value
Var. .01 Var. .02
R1
a1
a2
0.02
0.01
0.39
0.53
0.06
0.03
0.18
0.32
0.06
0.04
0.15
0.22
sm
snm
a1
a2
0.02
0.01
0.38
0.54
0.05
0.02
0.21
0.31
0.04
0.03
0.14
0.21
sm
snm
R2
Note: Var. = Variance
15
Table 3
Absolute Differences between True and Estimated
Parameter Values for 48 Subjects in R1, Variance .01 Simulation
Parameter
Statistics of Differences
Range
Median
sm
0 to .999
.019
snm
0 to .501
.003
.001 to 1.00
.278
a1
a2
.001 to .990
.442
16
Table 4
True Performance and Performance Predicted by Model
with Set A or Set B Parameter Estimates
Score Type
Condition
A True
A Predicted
by A Pars.
A Predicted
by B Pars.
R1
LL4 w D1
0.70
0.69
0.70
LL4 s D2
LL4 w new
0.91
0.17
0.91
0.18
0.90
0.17
LL8 w D2
LL8 s D1
LL8 s new
0.83
0.85
0.23
0.82
0.85
0.24
0.81
0.85
0.26
LL4 w D1
LL4 s D2
LL4 w new
0.72
0.88
0.19
0.71
0.88
0.19
0.69
0.87
0.20
LL8 w D2
LL8 s D1
LL8 s new
0.80
0.84
0.27
0.80
0.84
0.26
0.79
0.82
0.30
R2
Note. Pars. = Parameter estimates.
Conditions are coded as in Table 1.
17
Figure Captions
Figure 1. Cumulative mean disparities between observed and predicted performance
and between true and predicted performance over sequences of unit time intervals.
Figure 2. Means of predicted versus true performance by individuals in simulations of
the array model. Error variances equal to .01 and .02 are represented in the upper and
lower panels respectively. Each point represents the mean over 12 conditions of the
predicted and true scores for an individual subject.
Figure 3. Standard deviations (SDs) of individual subjects' observed-predicted
performance scores obtained from fit of the array model.
Figure 4. Correlations (Pearson rs) between true performance scores of individual
subjects and those predicted by the array model, plotted in rank order of magnitude.
18
Figures
Higher Noise
0.02
Cumulative Mean
tr - pr
0.01
ob - pr
0.00
-0.01
0
20
40
60
80
Sequence Length
100
19
Lower Noise
Cumulative Mean
0.02
tr - pr
0.01
ob - pr
0.00
-0.01
0
20
40
60
Sequence Length
80
100
20
Error Variance .01
1.0
M Predicted
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
M True
0.8
1.0
21
Error Variance .02
1.0
M Predicted
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
M True
0.8
1.0
22
Standard Deviation
0.12
0.08
0.04
0.00
0
8
16
24
32
Rank Order
40
48
23
1.00
Correlation
0.99
0.98
0.97
0.96
0.95
0
8
16
24
32
Rank Order
40
48