Transformation learning and its effect on similarity

Transformation learning and its effect on similarity
Steven Langsford, Daniel J. Navarro, Amy Perfors, Andrew T. Hendrickson
School of Psychology
University of Adelaide
Abstract
The transformational theory of similarity suggests that more similar items
are those which are easier to transform into each other. Although this theory has been quite influential, little is known about how transformations are
learned and to what extent learned transformations affect similarity judgments. This paper presents three experiments addressing these questions.
In all of the experiments, people were taught novel categories defined by an
arbitrary transformation. In Experiment 1, when the transformations were
directly visible, people had no trouble learning and were able to apply their
knowledge to both similarity and categorization judgments involving novel
items. In Experiment 2, the task required transformations to be inferred
rather than observed; this resulted in very poor learning overall. Experiment 3 had simplified stimuli but still required transformations to be inferred. People were able to learn in this case, but the effects on similarity
were weak or non-existent. Overall, this work suggests that transformation
learning (and generalizing to similarity) is possible but not automatic or easy.
Implications for the transformational theory of similarity are discussed.
Introduction
Similarity plays a central role in human cognition, serving an explanatory role within
many theories of categorization (Nosofsky, 1986), reasoning (Riesbeck & Schank, 2013;
Novick, 1988), and memory (Baddeley, 1966; Shulman, 1971). Because similarity is context
dependent (Barsalou, 1983) and cannot be defined on purely logical grounds (Goodman,
1972; Watanabe, 1985), cognitive psychologists have been interested in how people assess
similarity in its own right (see e.g., Goldstone, Day, & Son, 2010). Early work developed
simple set theoretic models that characterize similarity in terms of shared and distinctive
features (Tversky, 1977), as well as geometric models in which similarity is inversely related
to distance between items in a psychological space (Shepard, 1987). In many situations these
models are adequate for describing how people make judgments about very simple objects.
However, systematic failures emerge when the stimuli are more structured (Biederman,
1987; Wattenmaker, Nakamura, & Medin, 1988). In light of these failures, researchers have
TRANSFORMATION LEARNING
2
proposed more complicated theories based on ideas such as structure mapping (Gentner,
1983) and stimulus transformation (Hahn, Chater, & Richardson, 2003; Hahn, 2014).
This paper focuses on the transformational view of stimulus similarity. The core idea
is that the harder it is to mentally “transform” one object into another, the less similar
those objects are to each other. As a simple example, consider the pair of alphanumeric
strings 111xxx1 and 000xxx0. To transform the first string into the second, all we need to
do is replace all the 1s with 0s. Turning 111xxx1 into 000yz0 would be a more complicated
operation, since we would also need to convert the xxx substring into a yz string.
Starting with early work by Imai (1977), there is now a considerable body of literature
arguing for and against the transformational views of similarity. Several papers outline the
theoretical foundations of the transformational approach (Chater & Vitányi, 2003; Chater
& Hahn, 1997; Bennett, Gács, Li, Vitányi, & Zurek, 1998), the empirical evidence for it
(Hahn et al., 2003; Hodgetts, Hahn, & Chater, 2009; Hahn, 2014) as well as the arguments
against it (Grimm, Rein, & Markman, 2012; Müller, van Rooij, & Wareham, 2009).
Setting aside the question of whether stimulus transformation provides a comprehensive view of how people perceive similarities, it seems unquestionable that the structure
underpinning at least some categories is naturally amenable to a transformational description. This occurs when the categories themselves are characterized in terms of a set of
admissible transformational operations. For example, people tend take transformations due
to aging into account when identifying faces (Mark, Todd, & Shaw, 1981), and track object
identities by inferring motion from a sequence of still images (Freyd, 1983).
A question that follows naturally from these considerations is: where do these transformations come from? Put another way, how do people realize what the set of transformational primitives is? For many of the simple stimuli used in the experimental literature,
there are well-defined sets of transformations that have been consistently applied across
studies and have considerable empirical justification (Hodgetts et al., 2009). In other cases
there are simple physical transformations like “rigid rotation” that might be provided naturally by the perceptual system. However, it is less obvious what transformations might
underpin the comparison between pairs of faces of different ages, plants photographed in
different seasons, or even a Rubik’s cube in different states. In many cases, not only is it unclear what transformations researchers should use to specify a transformational theory (e.g.
Grimm et al., 2012) it is also unclear how people might learn the relevant transformations.
It is also unclear whether learned transformations can have the same effects on similarity as transformations that are given by the perceptual system or are obvious on the
basis of the simple stimuli involved (Hahn et al., 2003; Hahn, 2014). Learned transformations might have more fragile representations or be less cognitively accessible compared to
more perceptually obvious ones. If this is the case, it has important implications for the
generality of the transformational view of similarity, suggesting that transformations only
underlie similarity for specific kinds of categories or representations. Conversely, if learned
transformations easily or automatically change similarity judgments, this implies that the
transformational view of similarity may apply to a range of natural categories in the world.
Relatively little research has addressed either of these issues. To what extent are
transformations learnable, and to what extent do these learned transformations impact
similarity? The most relevant empirical work comes from Hahn, Close, and Graf (2009),
who found that people who were shown morphs from A to B rated similarity higher in the
TRANSFORMATION LEARNING
3
observed morph direction than the reverse direction. This result hints that people are able
to learn what classes of transformations (in this case morph directions) are relevant to a
particular context, and shows that there may be some impact of this learned transformation on similarity. There are similar results in which movement features – which might be
characterized as a kind of transformation – can learned and then used to drive categorical
perception effects in Andrews, Livingston, Auerbach, Altiero, and Neumeyer (2014). However, this research measures similarity as a means to infer whether the transformations were
learned; it does not explore whether the transformations were learned directly. It also does
not investigate in what circumstances transformational learning is possible, nor to what
extent depth or effectiveness of learning impacts on similarity judgments.
This paper addresses these issues through a series of three experiments. In Experiment
1, people were taught a transformation applying to 3x3 grids of colors. The stimuli and the
learning task were both very simple, and in this situation people were able to easily learn the
transformations as well as apply them when making similarity judgments. Experiment 2 was
designed to explore the extent to which these outcomes emerged because of the simplicity
of the learning task. In this experiment, the training format no longer made it possible for
people to directly observe transformations in operation; instead, participants had to infer
them from the category members. This resulted in a large increase in task difficulty and the
majority of people failing to learn. Experiment 3 retained the training format but made the
stimuli even simpler, and showed that people were capable of learning the transformations
then. However, the learned transformations had weak or non-existent effects on similarity
judgments. These experiments taken together suggest that arbitrary transformations are
learnable and can impact similarity judgments, but that there are limitations on both the
learning and the automaticity with which learned transformations affect similarity.
Experiment 1
The goal of the first experiment was to investigate whether people can learn arbitrary
categories defined in terms of an abstract stimulus transformation – and, if so, whether
they can generalize this learning sensibly to new items and new categories. Since our initial
question was whether transformation learning is possible at all, we designed the learning
task so that the relevant transformation was as salient as possible. This also establishes a
baseline level of performance to compare other more complex experiments to.
The overall structure of this experiment (and subsequent ones) is as follows. First,
during the training phase, people were presented with a series of category learning tasks.
The categories were defined by a single relevant transformation, and learning proceeded
until a criterion was reached. At this point participants proceeded to a test phase in which
they saw stimuli belonging to novel categories and were asked to make either similarity
or categorization judgments about them. Our question was whether people’s responses
at test would reflect the transformations that were highlighted during training. If the
transformation is truly being learned – and if it truly affects similarity – one would expect
this to occur; conversely, if people reached criterion in the training phase through exemplar
TRANSFORMATION LEARNING
4
Figure 1 . The two transformations used during the training phase of Experiment 1. In the
movement training people learned a non-rigid clockwise rotation transformation (top
row), whereas in the color training condition they learned a color swapping rule (bottom
row). For both transformations, the image on the left shows how that transformation was
defined, and the image on the right gives an example of how it operates on a particular
stimulus. In this figure we use textures to display the four different possibilities for each
cell. However the actual stimuli were presented in color, with the four possible values being
red, green, yellow and blue.
memorization one would expect poor or non-existent transfer.
Method
Participants. 444 adults (62% male) were recruited via Amazon Mechanical Turk.
47 were excluded: 12 for self-reported color-blindness and 35 for failing to pass check questions during the test phase of the experiment. Ages ranged from 18 to 67 with a mean of
33.25. 311 participants were from the USA, 120 were from India, and 13 were from other
countries. Participants were paid US$0.75 for the task, which took about 10 minutes to
complete.
Procedure: Training Phase. Stimuli were 3x3 grids of colored cells, where each
cell could be one of four colors (red, yellow, blue or green). The training phase of the
experiment was designed to teach people a novel transformation defined over these objects.
Three between-subject training conditions were used, movement, color and identity
training. In the identity training condition (used as a control) a “null” transformation
was used: the stimulus did not change at all when the transformation was applied. In the
color training condition, the transformation was a color-swapping rule in which red and
green cells switched colors, as did the blue and yellow ones. Finally, in the movement
training condition, the correct stimuli were generated by applying a non-rigid clockwise
rotation. These are illustrated in Figure 1.
The training phase took the form of a series of categorization task. On any given
trial participants were shown a single ‘base’ stimulus and told that it belonged to a novel
category (e.g., wugs). Two alternatives items were displayed underneath and people were
asked to guess which of these two also belonged to the category. This set up is illustrated
in Figure 2.
The critical feature of this categorization task is that category members were gener-
TRANSFORMATION LEARNING
Figure 2 . Presentation format in the training phase in Experiment 1. Items were presented
in a two-alternative forced choice format in which people had to select which of the two items
on the bottom were in the same category as the target object (top). In this experiment, in
order to make the transformations more salient and learnable the target object stayed in
place and people viewed the transformation being applied after they answered the question.
ated by taking an initial stimulus and repeatedly applying the novel transformation. This is
illustrated in Figure 3, which shows how the same transformation can be used to generate a
category of wugs and a category of philbixes. When learning the wug category, for instance,
the base item would always be one of the wug category members, and one of the two alternatives would always be the particular wug that is produced when the transformation
is applied to the base item (i.e., the next step in the chain shown in Figure 3). The other
stimulus was a foil item generated by taking the same base item and applying a transformation similar to the correct one. The Appendix lists details of the transformations used
to generate foils.
After each guess participants were given feedback. On an incorrect choice, the message
“Sorry, try again” appeared on screen and participants had to click on the other option in
order to proceed. After the correct response was given, the message “Correct” appeared
on screen, and participants watched as the base stimulus was morphed into the correct
one. The next trial would then begin with the newly transformed item as the new target
stimulus. By presenting this animation, the experiment made the transformation directly
observable to people on every trial.
For any one category (e.g., wugs) this process continued until either the participant
made four correct choices in a row or 40 trials had elapsed, at which point the experiment
moved on to the next category (e.g., philbixes). This continued across a sequence of six
categories, at which point the experiment moved into the test phase.
Procedure: Test Phase. In the test phase, participants were asked to make judgments about novel stimuli, all of which were constructed using color patterns that had never
appeared during the training phase. Participants were assigned at random into a similarity condition and a categorization condition. In the similarity condition every test
trial presented two novel items and asked people to rate their similarity on a 7 point scale.
In the categorization condition the question asked people how likely the two items were
5
TRANSFORMATION LEARNING
6
Table 1
Structure of the test phase items, and their relationship to the different training conditions.
The critical prediction is that when test items are related by a trained transformation (e.g.,
oldColor) and perhaps a similar one (e.g., newColor), the participants who had learned
the relevant transformation (e.g., color training) should rate these items as more similar
or more likely to belong to the same category than participants in the other training condition
(e.g., movement training).
transformation relating
the test item pairs
oldColor
newColor
oldMovement
newMovement
identity
arbitrary
interpretation:
color training
movement training:
trained transformation
similar to training
unrelated to training
unrelated to training
control trial
control trial
unrelated to training
unrelated to training
trained transformation
similar to training
control trial
control trial
to “have the same name”. In both cases the low end of the scale was labeled “Not at all”
and the high end was labeled “Extremely”.
There were six qualitatively distinct kinds of test trials, listed in Table 1. On an
identity trial the two items were identical, and on an arbitrary trial the items had no
relationship at all. Neither of these are of theoretical interest: the identity trials were
included as part of the exclusion criteria (see below) and the arbitrary trials were included
to assist participants with calibration by showing examples of very dissimilar items.
The other four kinds of test trial were all related to the learned transformations in
some fashion. In the oldColor trials the two items were related via the color transformation used in the training phase for the color training condition. The oldMovement
trials were related to the movement training condition in the same way.
For the newColor trials, the two items were related via a color swapping rule, but
the specific transformation differed from the one used in the training phase. Instead of
swapping red with green and blue with yellow, the transformation that related stimuli in
the newColor condition swapped red with blue and green with yellow. Similarly, the
transformation in the newMovement condition also used a non-rigid movement of the
cells, but instead of rotating the cells, each row shifted downwards by one position, with
the bottom row moving to the top.
All participants were shown the same test trials, irrespective of the training condition
or judgment type they were assigned to. There were four trials each for the oldColor,
oldMovement, newColor and newMovement conditions, and two trials for the identity and arbitrary types. Order of presentation was randomized.
Exclusion criteria. The experiment used two different pre-defined exclusion criteria, one based on training phase responses and one based on test phase responses. For
the training phase, if any participant took more than 40 trials to learn any category that
participant’s data would be excluded. No participants were excluded on this basis. For the
TRANSFORMATION LEARNING
Figure 3 . An illustration of how the same transformation defines multiple categories. The
various wugs are all related to each other by the movement transformation defined in Figure 1, as are all the philbixes. However, due to the different configurations used, the wugs
are clearly distinguishable from the philbixes.
test phase, any participant who gave an average similarity/categorization rating of less than
6 (out of 7) to the identical stimuli were excluded: 35 people were removed on this basis.
Results
Training phase. The categories in all conditions were easy to learn, which is perhaps unsurprising given how salient we made the relevant transformations. People in the
movement training condition reached the criterion of four correct responses in a row in
6.3 trials on average, compared to 5.8 in color training. In the identity training
condition, participants “learned” the categories in very nearly the minimum possible time,
taking an average of 4.1 trials to reach the “4 correct in a row” criterion. The average accuracy over all training trials was 85% in movement training, 88% in color training,
and 98% in identity training.
Test phase. The raw responses for test items are listed in Table 2. The critical
comparison, for any given test item, is whether the average ratings in the color training
condition differ from the corresponding ratings in the movement training condition.1
These differences are plotted in Figure 4, along with 95% credible intervals computed using
the BEST package for R (Kruschke & Meredith, 2014). Regardless of whether participants
were asked to rate stimulus similarity (left panel) or to judge whether the stimuli belonged
to the same category (right panel) the same pattern emerges. When the novel stimuli were
related via the trained transformation, people rated them as more similar and more likely
to belong to the same category, relative to participants trained on the other transformation.
When the novel stimuli were related via a similar transformation, an attenuated version of
the effect appeared. In other words, participants learned more than just one specific color
(or movement) transformation when trained on that transformation; they also inferred that
other color (or movement) transformations would be more relevant to these categories.
To quantify the strength of evidence for these effects we ran Bayesian t-tests using
the BayesFactor package in R (Morey, Rouder, & Jamil, 2014). The results are listed on the
right hand side of Table 2. Although in some cases the evidence is equivocal, the general
1
Comparisons to the identity condition are omitted, primarily because the main thing people learned in
this condition is that categories showed no variability at all.
7
TRANSFORMATION LEARNING
8
Figure 4 . Differences in mean ratings given to novel test items, by participants in the
color training and movement training conditions in Experiment 1. Results are broken down by test trial type and judgment type. Positive values indicate that the items
received higher ratings in the color training condition. Error bars depict 95% Bayesian
credible intervals for the differences. In all cases people gave stronger similarity judgments
(left) and categorization ratings (right) when the test trials involved transformations that
either matched or were similar to the trained transformation. The effect is stronger for
categorization judgments than similarity judgments, and stronger when the tested transformation is identical to the trained one (i.e., oldColor, oldMovement) than when it
is similar to the trained one (i.e., newColor, newMovement).
pattern is exactly as one would expect on the basis of Figure 4: most Bayes factors show
moderate to strong evidence of an effect.
Discussion
Taken together, the results from Experiment 1 suggest that people are capable of
learning a novel transformation, recognizing that this transformation is relevant to determining category memberships, and applying this learned transformation to assess the
similarity between items that belong to novel categories. This basic finding is consistent
with the learning effect seen in Hahn et al. (2009), but extends previous results in showing systematic generalization across related transformations. That is, to the extent that
the test item effects generalized beyond the trained transformation and also encompassed
similar transformations (albeit in an attenuated fashion), people seem to be able to make
generalizations about the applicability of entire classes of transformations (e.g., all color
transformations) to entire classes of categories (e.g., all categories of stimuli defined over
TRANSFORMATION LEARNING
9
Table 2
Descriptive statistics and hypothesis tests for all theoretically relevant test items in Experiment 1.
Similarity judgments
Tested transformation
oldColor
newColor
oldMovement
newMovement
Color training
Movement training
Mean
4.37
3.72
2.04
2.15
Mean
2.86
2.93
3.01
2.68
SD
1.40
1.54
1.67
1.80
SD
2.17
1.94
1.70
1.7
Difference
1.50
.78
-0.97
-0.522
Bayes factor
> 1000
7.16
303
1.03
Difference
3.15
.916
-2.35
-2.04
Bayes factor
> 1000
3.86
> 1000
> 1000
Categorization judgments
Tested transformation
oldColor
newColor
oldMovement
newMovement
Color training
Movement training
Mean
4.84
3.03
1.26
1.21
Mean
1.69
2.11
3.61
3.25
SD
1.56
2.27
1.65
1.7
SD
1.99
2.01
1.76
1.97
these 3x3 grids).
Experiment 2
Experiment 1 provides “in principle” evidence that people are capable of learning
very rich knowledge about classes of stimulus transformations and classes of categories to
which they are applicable. However, the structure of the task was deliberately designed to
make the learning problem as easy as possible, and it is not clear how generally the results
hold. In particular, during the training phase in Experiment 1 participants were shown the
actual stimulus transformation in operation at the end of every trial. For some categories
this “direct exposure” is not entirely unrealistic: people learn to perceive the relationships
between different Rubik’s cube configurations after direct manipulation of the object that
reveals the structure of the admissible transformations.
However, this kind of situation seems likely to be the exception rather than the rule.
When considering the transformations underpinning plant growth, the aging of human faces,
and so on, the relevant transformations need to be inferred from more indirect evidence.
When learning new categories in real life it is more typical for people to encounter a variety
of (possibly labelled) category exemplars. For that reason category learning experiments
tend to use a supervised or unsupervised classification task as the proxy for real world
learning. When learning the transformations involved in the aging of human faces, for
example, people observe many faces both young and old, but do not directly observe the
aging process in action. At best, people might see photographs of the same face taken a
few years apart.
Are the underlying transformations learnable, and do they affect similarity, in this
TRANSFORMATION LEARNING
10
Figure 5 . Presentation format for stimuli in Experiment 2, consisting of a same category/different category judgment.
more general case? We addressed this question in Experiment 2 by changing the presentation format of the examples so that the transformation relating category members was
no longer directly accessible and had to be learned from the exemplars that comprised the
category.
Method
Participants. 272 adults (55% male) were recruited via Amazon Mechanical Turk.
Of these, 170 were retained in the final analysis: 6 failed to complete the task, 2 reported
color-blindness, 33 were excluded for giving low ratings to the attention-check test items, and
61 were excluded for not reaching criterion performance during training phase (described
below). Ages ranged between 19 and 69 (mean: 36.6). 207 were from the USA, 60 were
from India, and the rest from other countries. People were paid US$1 for the 10-15 minute
experiment.
Procedure: Training Phase. The training stimuli and procedure were almost
identical to Experiment 1. There were two major changes. First, the feedback procedure
was modified: people were told whether they were correct or not, but they were not shown
the transformation in action. Secondly, instead of asking people to compare two items to
a target (as in Figure 2) people were shown two items and asked to judge if they both
belonged to the same category, as shown in Figure 5.
The trial structure during training was as follows. On trials where the correct answer
was “yes”, the stimuli were related by a single application of the training transformation.
On trials where the correct answer was “no”, the stimuli were related by an application of
the same foil-producing transformations used in Experiment 1. Within a category, trials
were blocked in groups of six, with three ‘accept’ and three ‘reject’ trials in each block.
Trials were randomized within blocks.
In addition to the two major changes to the design we made a few minor ones. First,
the trained color transformation was defined slightly differently. Instead of simultaneously
applying a red-green swap and a blue-yellow swap and counting that as a single application
of the transformation, only one of these two swaps was applied. That is, only one color-pair
was affected at each application of the transformation: either green swapped with red or
blue swapped with yellow, but not both. This was done in order to expand the number
of items within a category: if one transformation involves two color swaps, then there are
only two possible members per category, which creates a problem in a task like this one –
all “correct” test trials would involve the same two items.
TRANSFORMATION LEARNING
Second, anticipating that people would find the task somewhat more complicated,
we changed the criterion for when the experiment moved from one category to the next
during the training phase. In Experiment 1 people had to get four trials correct in a row.
In Experiment 2 the criterion was altered to be 8 of the last 10.
Procedure: Test Phase. Test items were constructed in the same way as in Experiment 1, but were not precisely the same items due to the fact that we had slightly
changed the definition of the color transformation. We removed the arbitrary test trials,
but retained the other five types shown in Table 1.
Tests were presented in four blocks of eight, with each of the four theoretically interesting types of test item appearing twice in each block, once with the source grid on the
left and the transformed grid on the right, and once with the transformed grid on the left
and the source grid on the right. Order was randomized within blocks, the first and third
blocks each contained one example of an identical trial.
Exclusion criteria. The exclusion criteria were unchanged from Experiment 1.
Results
Training phase. The key result from Experiment 2 is the training phase data,
which reveals that people largely failed to learn the transformations. In the identity
training condition people had few difficulties: 89% of all participants reached criterion
without running afoul of the 40-trial exclusion criterion, again “learning” the categories in
close to the minimum possible time, taking 8.74 trials on average to reach criterion and
classifying items with an overall accuracy of 94%. However, for the two transformation
learning conditions, the categories were much more difficult. In the color training
condition only 60% of participants reached criterion. This subset of participants needed
an average of 14.6 trials to learn each category and had an average overall accuracy of 68%
(where 50% is chance). For the movement training the results were even worse, with
a mere 35% of participants reaching criterion (19.6 trials per category, 64% accuracy). As
such, the test phase data reflect only a minority of participants.
Test phase. Given how few people reached criterion, there is little to be gained
from analyzing the test trial data in much detail. For the sake of completeness the raw
responses for all conditions are included in the Appendix along with the relevant hypothesis
tests, but for the current purposes it suffices to note that there was no credible evidence for
any effects on the similarity judgments: all Bayes factors were in the “negligible evidence”
range. For the categorization judgments there was some suggestion that the movement
training may have had the desired effect, insofar as one of the Bayes factors was quite large,
but given that the 65% of participants were excluded from the data in this condition, this
result is mostly uninterpretable.
Discussion
The important finding from Experiment 2 is that participants found it very difficult to
learn transformation-defined categories when the transformation was not directly observable
during training. In one sense it is not surprising that the learning problem should be
harder, but the magnitude of the effect is quite noticeable. Without directly seeing the
11
TRANSFORMATION LEARNING
12
transformation in action, people do not find it easy to extract that information from a
standard categorization task. This is despite the fact that our stimuli were still quite
simple relative to real-world stimuli, and the statistical information provided by the stimuli
was not noticeably different from Experiment 1 to Experiment 2.
Experiment 3
Taken together, the first two experiments pose a puzzle. Arguably the only substantive difference between them is the fact that Experiment 1 actively highlighted the
transformation in action while Experiment 2 obscured it. Experiment 1 suggests that people should quite readily learn how different members of a category are related by inferring
the set of admissible transformations. It also suggests that this learning should be reflected
in their similarity judgments and that they will readily generalize their learning to similar
but untaught transformations. By contrast, Experiment 2 suggests that this result might
be sharply limited in applicability to those cases in which the transformation is directly
observable. This possibility seems at odds with the way that transformational similarity
is generally conceived (Chater & Hahn, 1997), because abstract operations such as “create feature” and “apply feature” are not themselves observable. It also implies that the
transformational view of similarity may not apply to most real-world categories, for which
transformations are never directly observed.
There is an intermediate possibility, however. Perhaps directly observing the transformation is not a prerequisite for learning transformational categories, but merely makes it
significantly easier. If that is the case, then if the rest of the task were simplified or training
were more extensive, people might be able to learn successfully and even generalize in a
manner similar to what was observed in Experiment 1. We investigated this possibility in
Experiment 3 by presenting people with a situation in which the transformation was not
directly observable (as in Experiment 2) but the stimuli involved were far simpler. In this
situation, learning was possible (though weaker) and generalization was more limited.
Method
Participants. 251 adults (60% male) were recruited via Amazon Mechanical Turk.
Of these 198 were retained in the final analysis: 12 did not complete the task, 2 reported
color-blindness, 27 were excluded for giving low ratings to the attention-check test items, and
12 were excluded for not reaching criterion performance (described below) demonstrating
learning on the last categories within the required number of presentations. Ages ranged
between 19 and 67 (mean: 34.7). 247 were from the USA, with the remainder from India,
South America, and the UK. People were paid US$1 for the 10-15 minute experiment.
Procedure. Experiment 3 was identical to Experiment 2 except in one respect:
the stimuli were simplified. This time the 3x3 grids were constrained so that six of the
cells were always the same color. This greatly reduced the visual complexity of the stimuli,
making it easier for people to see how two transformed stimuli are related. This is illustrated
Figure 6 which shows how the same transformations are more apparent when applied to
these simplified objects.
TRANSFORMATION LEARNING
Figure 6 . A comparison of the “complex” stimuli used in Experiments 1 and 2, and the
simplified ones used in Experiment 3. The underlying transformations in Experiment 3
were identical to those used in Experiment 2, but the stimuli to which they were applied
were simpler objects.
Exclusion criteria. The test-phase based exclusion criterion was unchanged from
Experiments 1 and 2. The training-phase criterion was modified slightly. Where in experiments 1 and 2, participants were excluded if they ever reached a cap of 40 presentations
of a single category, for experiment 3 we lowered the cap to 30 presentations, but excluded
only participants who reached this cap twice in the last three categories. The new criterion
was more lenient overall, but required better performance on the final categories.
Results
Training phase. Performance in the training phase improved relative to Experiment 2. The proportion of people who met the relevant inclusion criterion was much higher
and fairly homogeneous across conditions: 83% did so in the identity training condition, similar to the 74% of participants who did so during color training and the 80%
of participants in movement training condition. For identity training the average
number of trials was 8.5 and the overall accuracy was 92%; for color training the relevant numbers are 14.12 trials and 71% accuracy, and for movement training they are
13.8 trials and 73% accuracy.
Test phase. Table 3 shows the mean judgments to the different test items broken down by the type of training participants received and Figure 7 shows the differences
graphically. (As before, the corresponding data for the identity training are not of much interest, but are reproduced in the Appendix). For the categorization judgments, the pattern
of results is very similar to Experiment 1 and the Bayes factors are large for all relevant
comparisons. People endorsed novel items as more similar if they were related by the same
transformation that they had been trained upon, compared to people trained on the other
kind of transformation. In addition, as before, people were willing to generalize their categorization judgments to test items with related transformations (e.g., people in the color
training condition endorsing the newColor items). The only oddity is the fact that the
newColor items actually had a larger effect than the oldColor items, but given the
large size of the credible intervals in question that difference does not seem notable.
In contrast to the categorization judgments, there is much less effect of training on
13
TRANSFORMATION LEARNING
14
Table 3
Descriptive statistics and hypothesis tests for all test items in Experiment 3.
Tested transformation
oldColor
newColor
newMovement
oldMovement
Similarity judgments
Color training Rotation training
Mean
SD Mean
SD
4.04
1.46
3.34
1.82
3.26
1.53
2.23
1.7
2.39
1.53
2.09
1.32
2.60
1.6
3.10
1.61
Difference
0.70
1.03
.30
-0.50
Bayes factor
7.5
13.9
0.89
0.22
Tested transformation
oldColor
newColor
newMovement
oldMovement
Categorization judgments
Color training Rotation training
Mean
SD Mean
SD
4.29
1.67
3.21
2.13
3.59
1.88
1.80
2.08
0.93
1.42
2.29
1.86
1.04
1.54
3.15
1.86
Difference
1.07
1.8
-1.37
-2.11
Bayes factor
620
> 1000
> 1000
> 1000
the similarity judgment. In the movement training condition there is no evidence of
any effect on similarity: there was no difference between groups on the oldMovement
transformation (Bayes factor = .22), nor was there evidence of a difference on the newMovement transformation (Bayes factor = .89). There was more of an effect in the color
training condition with respect to the two color transformations (Bayes factors of 7.5 and
13.9 respectively), but these effects were modest relative to those in Experiment 1. Overall,
we found that the learned transformations were able to influence people’s notions of similarity in at least some cases to at least some extent – but at best the effect on similarity
judgments is substantially weaker than the effect on categorization judgments.
Discussion
The simplified stimulus design in Experiment 3 seemed to have resulted in improved
learning. The underlying transformations that people had to learn were essentially identical
to ones that were easily learned in Experiment 1 and very difficult to learn in Experiment
2. Taken together, these results suggest that it is possible for these transformation-defined
categories to be learned via standard supervised learning methods. Moreover, the fact that
people generalized sensibly on the categorization items during the test phase suggests that
people did in fact learn the relevant transformation (as opposed to following simple heuristics
or exemplar memorization strategies). Nevertheless, the fact that generalization on the
similarity judgment test trials showed a somewhat attenuated effect relative to Experiment 1
suggests that when people do not directly experience the transformations that leave category
memberships invariant – or when learning is less thorough or more fragile – they are less
willing to use those transformations when assessing similarity.
TRANSFORMATION LEARNING
Figure 7 . Differences in mean ratings given to novel test items in Experiment 3 (see Figure 4 caption for details). For the categorization judgments (right panel) the pattern of
results mirrors the effect seen in Experiment 1: people give higher ratings when the tested
transformation matches or is similar to the trained transformation. However, there far
less evidence for any corresponding effect among participants in the similarity judgment
condition: there is a modest effect for the color transformations, and none at all for the
movement transformations.
General Discussion
The aim of this research was to examine how readily arbitrary transformations are
learned in a simple artificial setting, and the extent to which such learning can influence
similarity and categorization judgments. The results were somewhat mixed. Although
participants were capable of quickly learning a new transformation if it was made obvious
enough, the application of this learning to similarity was inconsistent.
Our results are broadly consistent with research finding that people can learn simple
transformations over short time-frames (Hahn et al., 2009; Andrews et al., 2014), but
extends previous work in two ways. First, our results show that people can generalize learned
transformations to similar ones (e.g., from one color transformation to other untrained color
transformations). Second, they suggest that at least in some cases it is possible for people
to learn a transformation and understand how to apply it to novel categories without any
corresponding effect on similarity (i.e., Experiment 3).
Regarding the first of these points, in both Experiments 1 and 3 people were able
to do more than just learn the relevant transformations. In both cases people learned
a general class of transformations. Relative to participants in the movement training
condition, people in the color training condition reported that two completely novel
items were more closely related even when the transformation that related them was itself
novel, as long as it was also a color-swapping transformation of some kind. This effect
15
TRANSFORMATION LEARNING
16
was somewhat smaller than the effect observed when the novel items were related by the
actually-trained transformation, which suggests the possibility that this behavior represents
actual generalization rather than a confusion between the different transformations.
Our findings also suggest that learning a transformation and applying it to similarity judgment are dissociable in at least some cases. In Experiment 1, when the trained
transformation was directly observable, we observed effects on both the categorization and
similarity of novel stimuli. In Experiment 3, when no direct observations were available,
categorization and similarity judgments at test dissociated somewhat: there was only modest evidence for an effect on similarity, and even then only for the color transformations.
Arguably this result does not necessarily pose any great difficulty for the transformational
view: it may be the case only well-learned or obviously relevant transformations have the
power to influence similarity. If that is true, then it is not surprising that the direct observation in Experiment 1 had exactly the effect of making the transformation appear relevant
and easy to learn. Even so, the key point is that merely learning that a particular transformation is critical to a particular category does not necessarily imply that it will influence
perceived similarity among stimuli. In addition, if transformations have to be well-learned
in order to affect similarity, it suggests some limitations on the generality of the theory of
transformational similarity.
These conclusions are subject to some limitations. It may be that learning in such
short time-frames is qualitatively different from learning over weeks, months, or years
(though there are also real world scenarios that require similarity judgments on the basis of
very limited experience, for example when starting a new job or cooking in an unfamiliar
style). In particular, it may be the case that the dissociation observed between categorization judgments and similarity judgments is simply one of degree, and that with enough
training transfer to similarity ratings could be found even for the rotation transformation
test items under the training used in Experiment 3. Such an interpretation would still leave
open the question of why there should be such a pronounced relative difference between the
test phase results in Experiments 1 and 3, with the transformations learned from indirect
evidence (Experiment 3) generalizing less readily than those learned from direct evidence
(Experiment 1), even though both are successfully learned.
Category-preserving transformations are common features of many natural categories,
like aging, flowering, or allowing a Rubik’s-cube-style partial rotation. This work shows
that people can learn which transformations (and classes of transformations) leave category
membership intact, and use this knowledge to guide categorization and similarity judgments
about novel items. However, these generalizations appear to depend on the psychological
availability of the transformation, which we found to vary as a function of complexity of
the stimulus set and the degree to which the transformation itself was directly observed.
References
Andrews, J., Livingston, K., Auerbach, J., Altiero, E., & Neumeyer, K. (2014). Does learning to categorize visual stimuli based on motion features produce learned categorical
perception effects? In Proceedings of the 36th annual conference of the cognitive
science society (pp. 3170–3170).
TRANSFORMATION LEARNING
Baddeley, A. D. (1966). Short-term memory for word sequences as a function of acoustic,
semantic and formal similarity. The Quarterly Journal of Experimental Psychology,
18 (4), 362–365.
Barsalou, L. W. (1983). Ad hoc categories. Memory & cognition, 11 (3), 211–227.
Bennett, C. H., Gács, P., Li, M., Vitányi, P. M., & Zurek, W. H. (1998). Information
distance. Information Theory, IEEE Transactions on, 44 (4), 1407–1423.
Biederman, I. (1987). Recognition-by-components: a theory of human image understanding.
Psychological review, 94 (2), 115.
Chater, N., & Hahn, U. (1997). Representational distortion, similarity and the universal
law of generalization. In Simcat97: Proceedings of the interdisciplinary workshop on
similarity and categorization.
Chater, N., & Vitányi, P. (2003). The generalized universal law of generalization. Journal
of Mathematical Psychology, 47 (3), 346–369.
Freyd, J. J. (1983). The mental representation of movement when static stimuli are viewed.
Perception & Psychophysics, 33 (6), 575–581.
Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive
science, 7 (2), 155–170.
Goldstone, R. L., Day, S., & Son, J. Y. (2010). Comparison. In Towards a theory of thinking
(pp. 103–121). Springer.
Goodman, N. (1972). Problems and projects. Bobbs Merrill.
Grimm, L. R., Rein, J. R., & Markman, A. B. (2012). Determining transformation distance
in similarity: Considerations for assessing representational changes a priori. Thinking
& Reasoning, 18 (1), 59–80.
Hahn, U. (2014). Similarity. Wiley Interdisciplinary Reviews: Cognitive Science, 5 (3),
271–280.
Hahn, U., Chater, N., & Richardson, L. B. (2003). Similarity as transformation. Cognition,
87 (1), 1–32.
Hahn, U., Close, J., & Graf, M. (2009). Transformation direction influences shape-similarity
judgments. Psychological Science, 20 (4), 447–454.
Hodgetts, C. J., Hahn, U., & Chater, N. (2009). Transformation and alignment in similarity.
Cognition, 113 (1), 62–79.
Imai, S. (1977). Pattern similarity and cognitive transformations. Acta Psychologica, 41 (6),
433–447.
Kruschke, J. K., & Meredith, M.
(2014).
Best:
Bayesian estimation
supersedes the t-test [Computer software manual].
Retrieved from
http://CRAN.R-project.org/package=BEST (R package version 0.2.2)
Mark, L. S., Todd, J. T., & Shaw, R. E. (1981). Perception of growth: A geometric
analysis of how different styles of change are distinguished. Journal of Experimental
Psychology: Human Perception and Performance, 7 (4), 855.
Morey, R. D., Rouder, J. N., & Jamil, T. (2014). Bayesfactor: Computation of
bayes factors for common designs [Computer software manual]. Retrieved from
http://CRAN.R-project.org/package=BayesFactor (R package version 0.9.8)
Müller, M., van Rooij, I., & Wareham, T. (2009). Similarity as tractable transformation. In
Proceedings of the 31st annual conference of the cognitive science society (pp. 50–55).
17
TRANSFORMATION LEARNING
18
Nosofsky, R. M. (1986). Attention, similarity, and the identification–categorization relationship. Journal of experimental psychology: General, 115 (1), 39.
Novick, L. R. (1988). Analogical transfer, problem similarity, and expertise. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 14 (3), 510.
Riesbeck, C. K., & Schank, R. C. (2013). Inside case-based reasoning. Psychology Press.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science.
Science, 237 (4820), 1317–1323.
Shulman, H. G. (1971). Similarity effects in short-term memory. Psychological Bulletin,
75 (6), 399.
Tversky, A. (1977). Features of similarity. Psychological review, 84 (4), 327.
Watanabe, S. (1985). Pattern recognition: human and mechanical. John Wiley & Sons,
Inc.
Wattenmaker, W. D., Nakamura, G. V., & Medin, D. L. (1988). Relationships between
similarity-based and explanation-based categorization.
Appendix
Tables of results
Individual participants saw a number of different example of each test item type, 4
instances in Experiment 1 and 8 in Experiments 2-3. A participant’s ‘rating score’ for each
test type was considered to be the average of all ratings given for that type of test item.
The means and standard deviations reported here are of rating scores across participants,
rather than raw ratings across items.
The Bayes factors reported here were calculated using the ttestBF function from the
BayesFactor package for R (Morey et al., 2014) under the default settings. The test specifies
different priors over effect size for the null and alternative hypotheses, and reports the ratio
of likelihoods for each hypothesis given the observed data. Under the default settings, the
null posits that the true effect size is zero, and the alternative uses a Cauchy distribution
to cover a range of possible non-zero effect sizes, with values in the range (−0.7071, 0.7071)
considered most likely.
Foil generation
Candidate foil patterns in the movement training were created by first applying
the target transformation to produce the correct target, and then swapping the colours in
two randomly selected cells.
Candidate foil patterns in the color training were created by applying a colourswapping rule randomly selected from all reversible color changing rules using the four
possible colors. These could change either one color into another randomly selected one, or
two colors into arbitrarily selected compliments.
In the identity training condition, the foil generation procedure was randomly
selected from either of the above schemes on each trial.
Before displaying a candidate foil item, a check was run to ensure the result was not
reachable by repeated applications of any of the transformations considered here, including
TRANSFORMATION LEARNING
19
Table 4
Descriptive statistics and hypothesis tests for all test items in Experiment 2.
Tested transformation
oldColor
newColor
newMovement
oldMovement
Categorization judgments
Color training Rotation training
Mean
SD Mean
SD
4.20
1.61
3.33
2.24
3.20
2.03
2.83
2.24
1.21
1.55
3.01
2.00
1.17
1.55
3.34
1.97
Difference
0.88
0.38
-1.80
-2.17
Bayes factor
7.99
.84
> 1000
> 1000
Tested transformation
oldColor
newColor
newMovement
oldMovement
Similarity judgments
training Rotation training
SD Mean
SD
1.57
3.58
1.69
1.94
2.93
1.90
1.49
2.53
1.81
1.57
2.38
1.76
Difference
0.30
0.127
-1.01
-0.78
Bayes factor
.35
.32
2.36
1.34
Color
Mean
3.88
3.06
1.52
1.59
those used in the test items. If a candidate foil violated this constraint, a new one was
generated until an acceptable foil was found.
TRANSFORMATION LEARNING
20
Table 5
Responses after training on identity categories
Similarity judgments
Study 1
Study 2
Tested transformation Mean
SD Mean
SD
oldColor
2.77 1.77
2.93 1.69
newColor
2.98 1.57
2.22 1.74
newMovement
1.94 1.62
1.66 1.50
oldMovement
2.03 1.68
1.66 1.55
Study 3
Mean
SD
3.46 1.71
2.24 1.63
1.82 1.43
2.36 1.59
Categorization judgments
Study 1
Study 2
SD Mean
SD
Tested transformation Mean
oldColor
1.23 1.66
0.92 1.33
1.34 1.69
0.65 1.18
newColor
newMovement
0.79 1.29
0.58 1.10
oldMovement
0.89 1.38
.066 1.10
Study 3
Mean
SD
0.78 1.30
0.36 0.86
0.35 0.68
0.35 0.69