Transformation learning and its effect on similarity Steven Langsford, Daniel J. Navarro, Amy Perfors, Andrew T. Hendrickson School of Psychology University of Adelaide Abstract The transformational theory of similarity suggests that more similar items are those which are easier to transform into each other. Although this theory has been quite influential, little is known about how transformations are learned and to what extent learned transformations affect similarity judgments. This paper presents three experiments addressing these questions. In all of the experiments, people were taught novel categories defined by an arbitrary transformation. In Experiment 1, when the transformations were directly visible, people had no trouble learning and were able to apply their knowledge to both similarity and categorization judgments involving novel items. In Experiment 2, the task required transformations to be inferred rather than observed; this resulted in very poor learning overall. Experiment 3 had simplified stimuli but still required transformations to be inferred. People were able to learn in this case, but the effects on similarity were weak or non-existent. Overall, this work suggests that transformation learning (and generalizing to similarity) is possible but not automatic or easy. Implications for the transformational theory of similarity are discussed. Introduction Similarity plays a central role in human cognition, serving an explanatory role within many theories of categorization (Nosofsky, 1986), reasoning (Riesbeck & Schank, 2013; Novick, 1988), and memory (Baddeley, 1966; Shulman, 1971). Because similarity is context dependent (Barsalou, 1983) and cannot be defined on purely logical grounds (Goodman, 1972; Watanabe, 1985), cognitive psychologists have been interested in how people assess similarity in its own right (see e.g., Goldstone, Day, & Son, 2010). Early work developed simple set theoretic models that characterize similarity in terms of shared and distinctive features (Tversky, 1977), as well as geometric models in which similarity is inversely related to distance between items in a psychological space (Shepard, 1987). In many situations these models are adequate for describing how people make judgments about very simple objects. However, systematic failures emerge when the stimuli are more structured (Biederman, 1987; Wattenmaker, Nakamura, & Medin, 1988). In light of these failures, researchers have TRANSFORMATION LEARNING 2 proposed more complicated theories based on ideas such as structure mapping (Gentner, 1983) and stimulus transformation (Hahn, Chater, & Richardson, 2003; Hahn, 2014). This paper focuses on the transformational view of stimulus similarity. The core idea is that the harder it is to mentally “transform” one object into another, the less similar those objects are to each other. As a simple example, consider the pair of alphanumeric strings 111xxx1 and 000xxx0. To transform the first string into the second, all we need to do is replace all the 1s with 0s. Turning 111xxx1 into 000yz0 would be a more complicated operation, since we would also need to convert the xxx substring into a yz string. Starting with early work by Imai (1977), there is now a considerable body of literature arguing for and against the transformational views of similarity. Several papers outline the theoretical foundations of the transformational approach (Chater & Vitányi, 2003; Chater & Hahn, 1997; Bennett, Gács, Li, Vitányi, & Zurek, 1998), the empirical evidence for it (Hahn et al., 2003; Hodgetts, Hahn, & Chater, 2009; Hahn, 2014) as well as the arguments against it (Grimm, Rein, & Markman, 2012; Müller, van Rooij, & Wareham, 2009). Setting aside the question of whether stimulus transformation provides a comprehensive view of how people perceive similarities, it seems unquestionable that the structure underpinning at least some categories is naturally amenable to a transformational description. This occurs when the categories themselves are characterized in terms of a set of admissible transformational operations. For example, people tend take transformations due to aging into account when identifying faces (Mark, Todd, & Shaw, 1981), and track object identities by inferring motion from a sequence of still images (Freyd, 1983). A question that follows naturally from these considerations is: where do these transformations come from? Put another way, how do people realize what the set of transformational primitives is? For many of the simple stimuli used in the experimental literature, there are well-defined sets of transformations that have been consistently applied across studies and have considerable empirical justification (Hodgetts et al., 2009). In other cases there are simple physical transformations like “rigid rotation” that might be provided naturally by the perceptual system. However, it is less obvious what transformations might underpin the comparison between pairs of faces of different ages, plants photographed in different seasons, or even a Rubik’s cube in different states. In many cases, not only is it unclear what transformations researchers should use to specify a transformational theory (e.g. Grimm et al., 2012) it is also unclear how people might learn the relevant transformations. It is also unclear whether learned transformations can have the same effects on similarity as transformations that are given by the perceptual system or are obvious on the basis of the simple stimuli involved (Hahn et al., 2003; Hahn, 2014). Learned transformations might have more fragile representations or be less cognitively accessible compared to more perceptually obvious ones. If this is the case, it has important implications for the generality of the transformational view of similarity, suggesting that transformations only underlie similarity for specific kinds of categories or representations. Conversely, if learned transformations easily or automatically change similarity judgments, this implies that the transformational view of similarity may apply to a range of natural categories in the world. Relatively little research has addressed either of these issues. To what extent are transformations learnable, and to what extent do these learned transformations impact similarity? The most relevant empirical work comes from Hahn, Close, and Graf (2009), who found that people who were shown morphs from A to B rated similarity higher in the TRANSFORMATION LEARNING 3 observed morph direction than the reverse direction. This result hints that people are able to learn what classes of transformations (in this case morph directions) are relevant to a particular context, and shows that there may be some impact of this learned transformation on similarity. There are similar results in which movement features – which might be characterized as a kind of transformation – can learned and then used to drive categorical perception effects in Andrews, Livingston, Auerbach, Altiero, and Neumeyer (2014). However, this research measures similarity as a means to infer whether the transformations were learned; it does not explore whether the transformations were learned directly. It also does not investigate in what circumstances transformational learning is possible, nor to what extent depth or effectiveness of learning impacts on similarity judgments. This paper addresses these issues through a series of three experiments. In Experiment 1, people were taught a transformation applying to 3x3 grids of colors. The stimuli and the learning task were both very simple, and in this situation people were able to easily learn the transformations as well as apply them when making similarity judgments. Experiment 2 was designed to explore the extent to which these outcomes emerged because of the simplicity of the learning task. In this experiment, the training format no longer made it possible for people to directly observe transformations in operation; instead, participants had to infer them from the category members. This resulted in a large increase in task difficulty and the majority of people failing to learn. Experiment 3 retained the training format but made the stimuli even simpler, and showed that people were capable of learning the transformations then. However, the learned transformations had weak or non-existent effects on similarity judgments. These experiments taken together suggest that arbitrary transformations are learnable and can impact similarity judgments, but that there are limitations on both the learning and the automaticity with which learned transformations affect similarity. Experiment 1 The goal of the first experiment was to investigate whether people can learn arbitrary categories defined in terms of an abstract stimulus transformation – and, if so, whether they can generalize this learning sensibly to new items and new categories. Since our initial question was whether transformation learning is possible at all, we designed the learning task so that the relevant transformation was as salient as possible. This also establishes a baseline level of performance to compare other more complex experiments to. The overall structure of this experiment (and subsequent ones) is as follows. First, during the training phase, people were presented with a series of category learning tasks. The categories were defined by a single relevant transformation, and learning proceeded until a criterion was reached. At this point participants proceeded to a test phase in which they saw stimuli belonging to novel categories and were asked to make either similarity or categorization judgments about them. Our question was whether people’s responses at test would reflect the transformations that were highlighted during training. If the transformation is truly being learned – and if it truly affects similarity – one would expect this to occur; conversely, if people reached criterion in the training phase through exemplar TRANSFORMATION LEARNING 4 Figure 1 . The two transformations used during the training phase of Experiment 1. In the movement training people learned a non-rigid clockwise rotation transformation (top row), whereas in the color training condition they learned a color swapping rule (bottom row). For both transformations, the image on the left shows how that transformation was defined, and the image on the right gives an example of how it operates on a particular stimulus. In this figure we use textures to display the four different possibilities for each cell. However the actual stimuli were presented in color, with the four possible values being red, green, yellow and blue. memorization one would expect poor or non-existent transfer. Method Participants. 444 adults (62% male) were recruited via Amazon Mechanical Turk. 47 were excluded: 12 for self-reported color-blindness and 35 for failing to pass check questions during the test phase of the experiment. Ages ranged from 18 to 67 with a mean of 33.25. 311 participants were from the USA, 120 were from India, and 13 were from other countries. Participants were paid US$0.75 for the task, which took about 10 minutes to complete. Procedure: Training Phase. Stimuli were 3x3 grids of colored cells, where each cell could be one of four colors (red, yellow, blue or green). The training phase of the experiment was designed to teach people a novel transformation defined over these objects. Three between-subject training conditions were used, movement, color and identity training. In the identity training condition (used as a control) a “null” transformation was used: the stimulus did not change at all when the transformation was applied. In the color training condition, the transformation was a color-swapping rule in which red and green cells switched colors, as did the blue and yellow ones. Finally, in the movement training condition, the correct stimuli were generated by applying a non-rigid clockwise rotation. These are illustrated in Figure 1. The training phase took the form of a series of categorization task. On any given trial participants were shown a single ‘base’ stimulus and told that it belonged to a novel category (e.g., wugs). Two alternatives items were displayed underneath and people were asked to guess which of these two also belonged to the category. This set up is illustrated in Figure 2. The critical feature of this categorization task is that category members were gener- TRANSFORMATION LEARNING Figure 2 . Presentation format in the training phase in Experiment 1. Items were presented in a two-alternative forced choice format in which people had to select which of the two items on the bottom were in the same category as the target object (top). In this experiment, in order to make the transformations more salient and learnable the target object stayed in place and people viewed the transformation being applied after they answered the question. ated by taking an initial stimulus and repeatedly applying the novel transformation. This is illustrated in Figure 3, which shows how the same transformation can be used to generate a category of wugs and a category of philbixes. When learning the wug category, for instance, the base item would always be one of the wug category members, and one of the two alternatives would always be the particular wug that is produced when the transformation is applied to the base item (i.e., the next step in the chain shown in Figure 3). The other stimulus was a foil item generated by taking the same base item and applying a transformation similar to the correct one. The Appendix lists details of the transformations used to generate foils. After each guess participants were given feedback. On an incorrect choice, the message “Sorry, try again” appeared on screen and participants had to click on the other option in order to proceed. After the correct response was given, the message “Correct” appeared on screen, and participants watched as the base stimulus was morphed into the correct one. The next trial would then begin with the newly transformed item as the new target stimulus. By presenting this animation, the experiment made the transformation directly observable to people on every trial. For any one category (e.g., wugs) this process continued until either the participant made four correct choices in a row or 40 trials had elapsed, at which point the experiment moved on to the next category (e.g., philbixes). This continued across a sequence of six categories, at which point the experiment moved into the test phase. Procedure: Test Phase. In the test phase, participants were asked to make judgments about novel stimuli, all of which were constructed using color patterns that had never appeared during the training phase. Participants were assigned at random into a similarity condition and a categorization condition. In the similarity condition every test trial presented two novel items and asked people to rate their similarity on a 7 point scale. In the categorization condition the question asked people how likely the two items were 5 TRANSFORMATION LEARNING 6 Table 1 Structure of the test phase items, and their relationship to the different training conditions. The critical prediction is that when test items are related by a trained transformation (e.g., oldColor) and perhaps a similar one (e.g., newColor), the participants who had learned the relevant transformation (e.g., color training) should rate these items as more similar or more likely to belong to the same category than participants in the other training condition (e.g., movement training). transformation relating the test item pairs oldColor newColor oldMovement newMovement identity arbitrary interpretation: color training movement training: trained transformation similar to training unrelated to training unrelated to training control trial control trial unrelated to training unrelated to training trained transformation similar to training control trial control trial to “have the same name”. In both cases the low end of the scale was labeled “Not at all” and the high end was labeled “Extremely”. There were six qualitatively distinct kinds of test trials, listed in Table 1. On an identity trial the two items were identical, and on an arbitrary trial the items had no relationship at all. Neither of these are of theoretical interest: the identity trials were included as part of the exclusion criteria (see below) and the arbitrary trials were included to assist participants with calibration by showing examples of very dissimilar items. The other four kinds of test trial were all related to the learned transformations in some fashion. In the oldColor trials the two items were related via the color transformation used in the training phase for the color training condition. The oldMovement trials were related to the movement training condition in the same way. For the newColor trials, the two items were related via a color swapping rule, but the specific transformation differed from the one used in the training phase. Instead of swapping red with green and blue with yellow, the transformation that related stimuli in the newColor condition swapped red with blue and green with yellow. Similarly, the transformation in the newMovement condition also used a non-rigid movement of the cells, but instead of rotating the cells, each row shifted downwards by one position, with the bottom row moving to the top. All participants were shown the same test trials, irrespective of the training condition or judgment type they were assigned to. There were four trials each for the oldColor, oldMovement, newColor and newMovement conditions, and two trials for the identity and arbitrary types. Order of presentation was randomized. Exclusion criteria. The experiment used two different pre-defined exclusion criteria, one based on training phase responses and one based on test phase responses. For the training phase, if any participant took more than 40 trials to learn any category that participant’s data would be excluded. No participants were excluded on this basis. For the TRANSFORMATION LEARNING Figure 3 . An illustration of how the same transformation defines multiple categories. The various wugs are all related to each other by the movement transformation defined in Figure 1, as are all the philbixes. However, due to the different configurations used, the wugs are clearly distinguishable from the philbixes. test phase, any participant who gave an average similarity/categorization rating of less than 6 (out of 7) to the identical stimuli were excluded: 35 people were removed on this basis. Results Training phase. The categories in all conditions were easy to learn, which is perhaps unsurprising given how salient we made the relevant transformations. People in the movement training condition reached the criterion of four correct responses in a row in 6.3 trials on average, compared to 5.8 in color training. In the identity training condition, participants “learned” the categories in very nearly the minimum possible time, taking an average of 4.1 trials to reach the “4 correct in a row” criterion. The average accuracy over all training trials was 85% in movement training, 88% in color training, and 98% in identity training. Test phase. The raw responses for test items are listed in Table 2. The critical comparison, for any given test item, is whether the average ratings in the color training condition differ from the corresponding ratings in the movement training condition.1 These differences are plotted in Figure 4, along with 95% credible intervals computed using the BEST package for R (Kruschke & Meredith, 2014). Regardless of whether participants were asked to rate stimulus similarity (left panel) or to judge whether the stimuli belonged to the same category (right panel) the same pattern emerges. When the novel stimuli were related via the trained transformation, people rated them as more similar and more likely to belong to the same category, relative to participants trained on the other transformation. When the novel stimuli were related via a similar transformation, an attenuated version of the effect appeared. In other words, participants learned more than just one specific color (or movement) transformation when trained on that transformation; they also inferred that other color (or movement) transformations would be more relevant to these categories. To quantify the strength of evidence for these effects we ran Bayesian t-tests using the BayesFactor package in R (Morey, Rouder, & Jamil, 2014). The results are listed on the right hand side of Table 2. Although in some cases the evidence is equivocal, the general 1 Comparisons to the identity condition are omitted, primarily because the main thing people learned in this condition is that categories showed no variability at all. 7 TRANSFORMATION LEARNING 8 Figure 4 . Differences in mean ratings given to novel test items, by participants in the color training and movement training conditions in Experiment 1. Results are broken down by test trial type and judgment type. Positive values indicate that the items received higher ratings in the color training condition. Error bars depict 95% Bayesian credible intervals for the differences. In all cases people gave stronger similarity judgments (left) and categorization ratings (right) when the test trials involved transformations that either matched or were similar to the trained transformation. The effect is stronger for categorization judgments than similarity judgments, and stronger when the tested transformation is identical to the trained one (i.e., oldColor, oldMovement) than when it is similar to the trained one (i.e., newColor, newMovement). pattern is exactly as one would expect on the basis of Figure 4: most Bayes factors show moderate to strong evidence of an effect. Discussion Taken together, the results from Experiment 1 suggest that people are capable of learning a novel transformation, recognizing that this transformation is relevant to determining category memberships, and applying this learned transformation to assess the similarity between items that belong to novel categories. This basic finding is consistent with the learning effect seen in Hahn et al. (2009), but extends previous results in showing systematic generalization across related transformations. That is, to the extent that the test item effects generalized beyond the trained transformation and also encompassed similar transformations (albeit in an attenuated fashion), people seem to be able to make generalizations about the applicability of entire classes of transformations (e.g., all color transformations) to entire classes of categories (e.g., all categories of stimuli defined over TRANSFORMATION LEARNING 9 Table 2 Descriptive statistics and hypothesis tests for all theoretically relevant test items in Experiment 1. Similarity judgments Tested transformation oldColor newColor oldMovement newMovement Color training Movement training Mean 4.37 3.72 2.04 2.15 Mean 2.86 2.93 3.01 2.68 SD 1.40 1.54 1.67 1.80 SD 2.17 1.94 1.70 1.7 Difference 1.50 .78 -0.97 -0.522 Bayes factor > 1000 7.16 303 1.03 Difference 3.15 .916 -2.35 -2.04 Bayes factor > 1000 3.86 > 1000 > 1000 Categorization judgments Tested transformation oldColor newColor oldMovement newMovement Color training Movement training Mean 4.84 3.03 1.26 1.21 Mean 1.69 2.11 3.61 3.25 SD 1.56 2.27 1.65 1.7 SD 1.99 2.01 1.76 1.97 these 3x3 grids). Experiment 2 Experiment 1 provides “in principle” evidence that people are capable of learning very rich knowledge about classes of stimulus transformations and classes of categories to which they are applicable. However, the structure of the task was deliberately designed to make the learning problem as easy as possible, and it is not clear how generally the results hold. In particular, during the training phase in Experiment 1 participants were shown the actual stimulus transformation in operation at the end of every trial. For some categories this “direct exposure” is not entirely unrealistic: people learn to perceive the relationships between different Rubik’s cube configurations after direct manipulation of the object that reveals the structure of the admissible transformations. However, this kind of situation seems likely to be the exception rather than the rule. When considering the transformations underpinning plant growth, the aging of human faces, and so on, the relevant transformations need to be inferred from more indirect evidence. When learning new categories in real life it is more typical for people to encounter a variety of (possibly labelled) category exemplars. For that reason category learning experiments tend to use a supervised or unsupervised classification task as the proxy for real world learning. When learning the transformations involved in the aging of human faces, for example, people observe many faces both young and old, but do not directly observe the aging process in action. At best, people might see photographs of the same face taken a few years apart. Are the underlying transformations learnable, and do they affect similarity, in this TRANSFORMATION LEARNING 10 Figure 5 . Presentation format for stimuli in Experiment 2, consisting of a same category/different category judgment. more general case? We addressed this question in Experiment 2 by changing the presentation format of the examples so that the transformation relating category members was no longer directly accessible and had to be learned from the exemplars that comprised the category. Method Participants. 272 adults (55% male) were recruited via Amazon Mechanical Turk. Of these, 170 were retained in the final analysis: 6 failed to complete the task, 2 reported color-blindness, 33 were excluded for giving low ratings to the attention-check test items, and 61 were excluded for not reaching criterion performance during training phase (described below). Ages ranged between 19 and 69 (mean: 36.6). 207 were from the USA, 60 were from India, and the rest from other countries. People were paid US$1 for the 10-15 minute experiment. Procedure: Training Phase. The training stimuli and procedure were almost identical to Experiment 1. There were two major changes. First, the feedback procedure was modified: people were told whether they were correct or not, but they were not shown the transformation in action. Secondly, instead of asking people to compare two items to a target (as in Figure 2) people were shown two items and asked to judge if they both belonged to the same category, as shown in Figure 5. The trial structure during training was as follows. On trials where the correct answer was “yes”, the stimuli were related by a single application of the training transformation. On trials where the correct answer was “no”, the stimuli were related by an application of the same foil-producing transformations used in Experiment 1. Within a category, trials were blocked in groups of six, with three ‘accept’ and three ‘reject’ trials in each block. Trials were randomized within blocks. In addition to the two major changes to the design we made a few minor ones. First, the trained color transformation was defined slightly differently. Instead of simultaneously applying a red-green swap and a blue-yellow swap and counting that as a single application of the transformation, only one of these two swaps was applied. That is, only one color-pair was affected at each application of the transformation: either green swapped with red or blue swapped with yellow, but not both. This was done in order to expand the number of items within a category: if one transformation involves two color swaps, then there are only two possible members per category, which creates a problem in a task like this one – all “correct” test trials would involve the same two items. TRANSFORMATION LEARNING Second, anticipating that people would find the task somewhat more complicated, we changed the criterion for when the experiment moved from one category to the next during the training phase. In Experiment 1 people had to get four trials correct in a row. In Experiment 2 the criterion was altered to be 8 of the last 10. Procedure: Test Phase. Test items were constructed in the same way as in Experiment 1, but were not precisely the same items due to the fact that we had slightly changed the definition of the color transformation. We removed the arbitrary test trials, but retained the other five types shown in Table 1. Tests were presented in four blocks of eight, with each of the four theoretically interesting types of test item appearing twice in each block, once with the source grid on the left and the transformed grid on the right, and once with the transformed grid on the left and the source grid on the right. Order was randomized within blocks, the first and third blocks each contained one example of an identical trial. Exclusion criteria. The exclusion criteria were unchanged from Experiment 1. Results Training phase. The key result from Experiment 2 is the training phase data, which reveals that people largely failed to learn the transformations. In the identity training condition people had few difficulties: 89% of all participants reached criterion without running afoul of the 40-trial exclusion criterion, again “learning” the categories in close to the minimum possible time, taking 8.74 trials on average to reach criterion and classifying items with an overall accuracy of 94%. However, for the two transformation learning conditions, the categories were much more difficult. In the color training condition only 60% of participants reached criterion. This subset of participants needed an average of 14.6 trials to learn each category and had an average overall accuracy of 68% (where 50% is chance). For the movement training the results were even worse, with a mere 35% of participants reaching criterion (19.6 trials per category, 64% accuracy). As such, the test phase data reflect only a minority of participants. Test phase. Given how few people reached criterion, there is little to be gained from analyzing the test trial data in much detail. For the sake of completeness the raw responses for all conditions are included in the Appendix along with the relevant hypothesis tests, but for the current purposes it suffices to note that there was no credible evidence for any effects on the similarity judgments: all Bayes factors were in the “negligible evidence” range. For the categorization judgments there was some suggestion that the movement training may have had the desired effect, insofar as one of the Bayes factors was quite large, but given that the 65% of participants were excluded from the data in this condition, this result is mostly uninterpretable. Discussion The important finding from Experiment 2 is that participants found it very difficult to learn transformation-defined categories when the transformation was not directly observable during training. In one sense it is not surprising that the learning problem should be harder, but the magnitude of the effect is quite noticeable. Without directly seeing the 11 TRANSFORMATION LEARNING 12 transformation in action, people do not find it easy to extract that information from a standard categorization task. This is despite the fact that our stimuli were still quite simple relative to real-world stimuli, and the statistical information provided by the stimuli was not noticeably different from Experiment 1 to Experiment 2. Experiment 3 Taken together, the first two experiments pose a puzzle. Arguably the only substantive difference between them is the fact that Experiment 1 actively highlighted the transformation in action while Experiment 2 obscured it. Experiment 1 suggests that people should quite readily learn how different members of a category are related by inferring the set of admissible transformations. It also suggests that this learning should be reflected in their similarity judgments and that they will readily generalize their learning to similar but untaught transformations. By contrast, Experiment 2 suggests that this result might be sharply limited in applicability to those cases in which the transformation is directly observable. This possibility seems at odds with the way that transformational similarity is generally conceived (Chater & Hahn, 1997), because abstract operations such as “create feature” and “apply feature” are not themselves observable. It also implies that the transformational view of similarity may not apply to most real-world categories, for which transformations are never directly observed. There is an intermediate possibility, however. Perhaps directly observing the transformation is not a prerequisite for learning transformational categories, but merely makes it significantly easier. If that is the case, then if the rest of the task were simplified or training were more extensive, people might be able to learn successfully and even generalize in a manner similar to what was observed in Experiment 1. We investigated this possibility in Experiment 3 by presenting people with a situation in which the transformation was not directly observable (as in Experiment 2) but the stimuli involved were far simpler. In this situation, learning was possible (though weaker) and generalization was more limited. Method Participants. 251 adults (60% male) were recruited via Amazon Mechanical Turk. Of these 198 were retained in the final analysis: 12 did not complete the task, 2 reported color-blindness, 27 were excluded for giving low ratings to the attention-check test items, and 12 were excluded for not reaching criterion performance (described below) demonstrating learning on the last categories within the required number of presentations. Ages ranged between 19 and 67 (mean: 34.7). 247 were from the USA, with the remainder from India, South America, and the UK. People were paid US$1 for the 10-15 minute experiment. Procedure. Experiment 3 was identical to Experiment 2 except in one respect: the stimuli were simplified. This time the 3x3 grids were constrained so that six of the cells were always the same color. This greatly reduced the visual complexity of the stimuli, making it easier for people to see how two transformed stimuli are related. This is illustrated Figure 6 which shows how the same transformations are more apparent when applied to these simplified objects. TRANSFORMATION LEARNING Figure 6 . A comparison of the “complex” stimuli used in Experiments 1 and 2, and the simplified ones used in Experiment 3. The underlying transformations in Experiment 3 were identical to those used in Experiment 2, but the stimuli to which they were applied were simpler objects. Exclusion criteria. The test-phase based exclusion criterion was unchanged from Experiments 1 and 2. The training-phase criterion was modified slightly. Where in experiments 1 and 2, participants were excluded if they ever reached a cap of 40 presentations of a single category, for experiment 3 we lowered the cap to 30 presentations, but excluded only participants who reached this cap twice in the last three categories. The new criterion was more lenient overall, but required better performance on the final categories. Results Training phase. Performance in the training phase improved relative to Experiment 2. The proportion of people who met the relevant inclusion criterion was much higher and fairly homogeneous across conditions: 83% did so in the identity training condition, similar to the 74% of participants who did so during color training and the 80% of participants in movement training condition. For identity training the average number of trials was 8.5 and the overall accuracy was 92%; for color training the relevant numbers are 14.12 trials and 71% accuracy, and for movement training they are 13.8 trials and 73% accuracy. Test phase. Table 3 shows the mean judgments to the different test items broken down by the type of training participants received and Figure 7 shows the differences graphically. (As before, the corresponding data for the identity training are not of much interest, but are reproduced in the Appendix). For the categorization judgments, the pattern of results is very similar to Experiment 1 and the Bayes factors are large for all relevant comparisons. People endorsed novel items as more similar if they were related by the same transformation that they had been trained upon, compared to people trained on the other kind of transformation. In addition, as before, people were willing to generalize their categorization judgments to test items with related transformations (e.g., people in the color training condition endorsing the newColor items). The only oddity is the fact that the newColor items actually had a larger effect than the oldColor items, but given the large size of the credible intervals in question that difference does not seem notable. In contrast to the categorization judgments, there is much less effect of training on 13 TRANSFORMATION LEARNING 14 Table 3 Descriptive statistics and hypothesis tests for all test items in Experiment 3. Tested transformation oldColor newColor newMovement oldMovement Similarity judgments Color training Rotation training Mean SD Mean SD 4.04 1.46 3.34 1.82 3.26 1.53 2.23 1.7 2.39 1.53 2.09 1.32 2.60 1.6 3.10 1.61 Difference 0.70 1.03 .30 -0.50 Bayes factor 7.5 13.9 0.89 0.22 Tested transformation oldColor newColor newMovement oldMovement Categorization judgments Color training Rotation training Mean SD Mean SD 4.29 1.67 3.21 2.13 3.59 1.88 1.80 2.08 0.93 1.42 2.29 1.86 1.04 1.54 3.15 1.86 Difference 1.07 1.8 -1.37 -2.11 Bayes factor 620 > 1000 > 1000 > 1000 the similarity judgment. In the movement training condition there is no evidence of any effect on similarity: there was no difference between groups on the oldMovement transformation (Bayes factor = .22), nor was there evidence of a difference on the newMovement transformation (Bayes factor = .89). There was more of an effect in the color training condition with respect to the two color transformations (Bayes factors of 7.5 and 13.9 respectively), but these effects were modest relative to those in Experiment 1. Overall, we found that the learned transformations were able to influence people’s notions of similarity in at least some cases to at least some extent – but at best the effect on similarity judgments is substantially weaker than the effect on categorization judgments. Discussion The simplified stimulus design in Experiment 3 seemed to have resulted in improved learning. The underlying transformations that people had to learn were essentially identical to ones that were easily learned in Experiment 1 and very difficult to learn in Experiment 2. Taken together, these results suggest that it is possible for these transformation-defined categories to be learned via standard supervised learning methods. Moreover, the fact that people generalized sensibly on the categorization items during the test phase suggests that people did in fact learn the relevant transformation (as opposed to following simple heuristics or exemplar memorization strategies). Nevertheless, the fact that generalization on the similarity judgment test trials showed a somewhat attenuated effect relative to Experiment 1 suggests that when people do not directly experience the transformations that leave category memberships invariant – or when learning is less thorough or more fragile – they are less willing to use those transformations when assessing similarity. TRANSFORMATION LEARNING Figure 7 . Differences in mean ratings given to novel test items in Experiment 3 (see Figure 4 caption for details). For the categorization judgments (right panel) the pattern of results mirrors the effect seen in Experiment 1: people give higher ratings when the tested transformation matches or is similar to the trained transformation. However, there far less evidence for any corresponding effect among participants in the similarity judgment condition: there is a modest effect for the color transformations, and none at all for the movement transformations. General Discussion The aim of this research was to examine how readily arbitrary transformations are learned in a simple artificial setting, and the extent to which such learning can influence similarity and categorization judgments. The results were somewhat mixed. Although participants were capable of quickly learning a new transformation if it was made obvious enough, the application of this learning to similarity was inconsistent. Our results are broadly consistent with research finding that people can learn simple transformations over short time-frames (Hahn et al., 2009; Andrews et al., 2014), but extends previous work in two ways. First, our results show that people can generalize learned transformations to similar ones (e.g., from one color transformation to other untrained color transformations). Second, they suggest that at least in some cases it is possible for people to learn a transformation and understand how to apply it to novel categories without any corresponding effect on similarity (i.e., Experiment 3). Regarding the first of these points, in both Experiments 1 and 3 people were able to do more than just learn the relevant transformations. In both cases people learned a general class of transformations. Relative to participants in the movement training condition, people in the color training condition reported that two completely novel items were more closely related even when the transformation that related them was itself novel, as long as it was also a color-swapping transformation of some kind. This effect 15 TRANSFORMATION LEARNING 16 was somewhat smaller than the effect observed when the novel items were related by the actually-trained transformation, which suggests the possibility that this behavior represents actual generalization rather than a confusion between the different transformations. Our findings also suggest that learning a transformation and applying it to similarity judgment are dissociable in at least some cases. In Experiment 1, when the trained transformation was directly observable, we observed effects on both the categorization and similarity of novel stimuli. In Experiment 3, when no direct observations were available, categorization and similarity judgments at test dissociated somewhat: there was only modest evidence for an effect on similarity, and even then only for the color transformations. Arguably this result does not necessarily pose any great difficulty for the transformational view: it may be the case only well-learned or obviously relevant transformations have the power to influence similarity. If that is true, then it is not surprising that the direct observation in Experiment 1 had exactly the effect of making the transformation appear relevant and easy to learn. Even so, the key point is that merely learning that a particular transformation is critical to a particular category does not necessarily imply that it will influence perceived similarity among stimuli. In addition, if transformations have to be well-learned in order to affect similarity, it suggests some limitations on the generality of the theory of transformational similarity. These conclusions are subject to some limitations. It may be that learning in such short time-frames is qualitatively different from learning over weeks, months, or years (though there are also real world scenarios that require similarity judgments on the basis of very limited experience, for example when starting a new job or cooking in an unfamiliar style). In particular, it may be the case that the dissociation observed between categorization judgments and similarity judgments is simply one of degree, and that with enough training transfer to similarity ratings could be found even for the rotation transformation test items under the training used in Experiment 3. Such an interpretation would still leave open the question of why there should be such a pronounced relative difference between the test phase results in Experiments 1 and 3, with the transformations learned from indirect evidence (Experiment 3) generalizing less readily than those learned from direct evidence (Experiment 1), even though both are successfully learned. Category-preserving transformations are common features of many natural categories, like aging, flowering, or allowing a Rubik’s-cube-style partial rotation. This work shows that people can learn which transformations (and classes of transformations) leave category membership intact, and use this knowledge to guide categorization and similarity judgments about novel items. However, these generalizations appear to depend on the psychological availability of the transformation, which we found to vary as a function of complexity of the stimulus set and the degree to which the transformation itself was directly observed. References Andrews, J., Livingston, K., Auerbach, J., Altiero, E., & Neumeyer, K. (2014). Does learning to categorize visual stimuli based on motion features produce learned categorical perception effects? In Proceedings of the 36th annual conference of the cognitive science society (pp. 3170–3170). TRANSFORMATION LEARNING Baddeley, A. D. (1966). Short-term memory for word sequences as a function of acoustic, semantic and formal similarity. The Quarterly Journal of Experimental Psychology, 18 (4), 362–365. Barsalou, L. W. (1983). Ad hoc categories. Memory & cognition, 11 (3), 211–227. Bennett, C. H., Gács, P., Li, M., Vitányi, P. M., & Zurek, W. H. (1998). Information distance. Information Theory, IEEE Transactions on, 44 (4), 1407–1423. Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological review, 94 (2), 115. Chater, N., & Hahn, U. (1997). Representational distortion, similarity and the universal law of generalization. In Simcat97: Proceedings of the interdisciplinary workshop on similarity and categorization. Chater, N., & Vitányi, P. (2003). The generalized universal law of generalization. Journal of Mathematical Psychology, 47 (3), 346–369. Freyd, J. J. (1983). The mental representation of movement when static stimuli are viewed. Perception & Psychophysics, 33 (6), 575–581. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive science, 7 (2), 155–170. Goldstone, R. L., Day, S., & Son, J. Y. (2010). Comparison. In Towards a theory of thinking (pp. 103–121). Springer. Goodman, N. (1972). Problems and projects. Bobbs Merrill. Grimm, L. R., Rein, J. R., & Markman, A. B. (2012). Determining transformation distance in similarity: Considerations for assessing representational changes a priori. Thinking & Reasoning, 18 (1), 59–80. Hahn, U. (2014). Similarity. Wiley Interdisciplinary Reviews: Cognitive Science, 5 (3), 271–280. Hahn, U., Chater, N., & Richardson, L. B. (2003). Similarity as transformation. Cognition, 87 (1), 1–32. Hahn, U., Close, J., & Graf, M. (2009). Transformation direction influences shape-similarity judgments. Psychological Science, 20 (4), 447–454. Hodgetts, C. J., Hahn, U., & Chater, N. (2009). Transformation and alignment in similarity. Cognition, 113 (1), 62–79. Imai, S. (1977). Pattern similarity and cognitive transformations. Acta Psychologica, 41 (6), 433–447. Kruschke, J. K., & Meredith, M. (2014). Best: Bayesian estimation supersedes the t-test [Computer software manual]. Retrieved from http://CRAN.R-project.org/package=BEST (R package version 0.2.2) Mark, L. S., Todd, J. T., & Shaw, R. E. (1981). Perception of growth: A geometric analysis of how different styles of change are distinguished. Journal of Experimental Psychology: Human Perception and Performance, 7 (4), 855. Morey, R. D., Rouder, J. N., & Jamil, T. (2014). Bayesfactor: Computation of bayes factors for common designs [Computer software manual]. Retrieved from http://CRAN.R-project.org/package=BayesFactor (R package version 0.9.8) Müller, M., van Rooij, I., & Wareham, T. (2009). Similarity as tractable transformation. In Proceedings of the 31st annual conference of the cognitive science society (pp. 50–55). 17 TRANSFORMATION LEARNING 18 Nosofsky, R. M. (1986). Attention, similarity, and the identification–categorization relationship. Journal of experimental psychology: General, 115 (1), 39. Novick, L. R. (1988). Analogical transfer, problem similarity, and expertise. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14 (3), 510. Riesbeck, C. K., & Schank, R. C. (2013). Inside case-based reasoning. Psychology Press. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237 (4820), 1317–1323. Shulman, H. G. (1971). Similarity effects in short-term memory. Psychological Bulletin, 75 (6), 399. Tversky, A. (1977). Features of similarity. Psychological review, 84 (4), 327. Watanabe, S. (1985). Pattern recognition: human and mechanical. John Wiley & Sons, Inc. Wattenmaker, W. D., Nakamura, G. V., & Medin, D. L. (1988). Relationships between similarity-based and explanation-based categorization. Appendix Tables of results Individual participants saw a number of different example of each test item type, 4 instances in Experiment 1 and 8 in Experiments 2-3. A participant’s ‘rating score’ for each test type was considered to be the average of all ratings given for that type of test item. The means and standard deviations reported here are of rating scores across participants, rather than raw ratings across items. The Bayes factors reported here were calculated using the ttestBF function from the BayesFactor package for R (Morey et al., 2014) under the default settings. The test specifies different priors over effect size for the null and alternative hypotheses, and reports the ratio of likelihoods for each hypothesis given the observed data. Under the default settings, the null posits that the true effect size is zero, and the alternative uses a Cauchy distribution to cover a range of possible non-zero effect sizes, with values in the range (−0.7071, 0.7071) considered most likely. Foil generation Candidate foil patterns in the movement training were created by first applying the target transformation to produce the correct target, and then swapping the colours in two randomly selected cells. Candidate foil patterns in the color training were created by applying a colourswapping rule randomly selected from all reversible color changing rules using the four possible colors. These could change either one color into another randomly selected one, or two colors into arbitrarily selected compliments. In the identity training condition, the foil generation procedure was randomly selected from either of the above schemes on each trial. Before displaying a candidate foil item, a check was run to ensure the result was not reachable by repeated applications of any of the transformations considered here, including TRANSFORMATION LEARNING 19 Table 4 Descriptive statistics and hypothesis tests for all test items in Experiment 2. Tested transformation oldColor newColor newMovement oldMovement Categorization judgments Color training Rotation training Mean SD Mean SD 4.20 1.61 3.33 2.24 3.20 2.03 2.83 2.24 1.21 1.55 3.01 2.00 1.17 1.55 3.34 1.97 Difference 0.88 0.38 -1.80 -2.17 Bayes factor 7.99 .84 > 1000 > 1000 Tested transformation oldColor newColor newMovement oldMovement Similarity judgments training Rotation training SD Mean SD 1.57 3.58 1.69 1.94 2.93 1.90 1.49 2.53 1.81 1.57 2.38 1.76 Difference 0.30 0.127 -1.01 -0.78 Bayes factor .35 .32 2.36 1.34 Color Mean 3.88 3.06 1.52 1.59 those used in the test items. If a candidate foil violated this constraint, a new one was generated until an acceptable foil was found. TRANSFORMATION LEARNING 20 Table 5 Responses after training on identity categories Similarity judgments Study 1 Study 2 Tested transformation Mean SD Mean SD oldColor 2.77 1.77 2.93 1.69 newColor 2.98 1.57 2.22 1.74 newMovement 1.94 1.62 1.66 1.50 oldMovement 2.03 1.68 1.66 1.55 Study 3 Mean SD 3.46 1.71 2.24 1.63 1.82 1.43 2.36 1.59 Categorization judgments Study 1 Study 2 SD Mean SD Tested transformation Mean oldColor 1.23 1.66 0.92 1.33 1.34 1.69 0.65 1.18 newColor newMovement 0.79 1.29 0.58 1.10 oldMovement 0.89 1.38 .066 1.10 Study 3 Mean SD 0.78 1.30 0.36 0.86 0.35 0.68 0.35 0.69
© Copyright 2024