Bayesian cognitive science, under-considered alternatives, and the value of specialization

Bayesian cognitive science, under-considered alternatives, and
the value of specialization
October 16, 2014
Matteo Colombo, Tilburg Center for Logic and Philosophy of Science (TiLPS), Tilburg University, The Netherlands
Rogier De Langhe, Complex Systems Institute, Ghent University, Belgium
Abstract
A widely held assumption in cognitive science is that the Bayesian framework should be
chosen for discovering and assessing explanations of cognitive phenomena whose production involves uncertainty. However, it is controversial that the Bayesian framework enjoys
special epistemic virtues over available underconsidered alternatives for representing uncertainty. A better justification for adopting the Bayesian framework in cognitive science
is that currently it comprises a richer body of tools that can be opportunistically exploited
so as to foster specialization. As the case of Bayesian cognitive science illustrates, while
the value of specialization trades off with the value of innovation, specialization is often
the best way to achieve scientific progress.
Introduction
Bayesian decision theory is a theoretical framework ever more prominent in the cognitive
and brain sciences.1 Driven by mathematical advances in statistics and computer science, as
well as by engineering successes in fields such as machine learning and artificial intelligence,
Bayesian models have been proposed for many phenomena of perception, motor control,
learning, decision-making and reasoning (Chater, Tenenbaum, and Yuille (2006), Doya, Ishii,
Pouget, and Rao (2007), D. C. Knill and Richards (1996), K¨ording (2007), Maloney (2002),
Rao, Olshausen, and Lewicki (2002), Tenenbaum, Kemp, Griffiths, and Goodman (2011)).
One of the most common arguments for favoring the Bayesian approach in cognitive science is
based on the fact that uncertainty is an ineliminable feature of cognitive systems’ interactions
with the world. A cognitive system’s interaction with the world would require the system
to infer the values of unknown parameters from the input data it receives. Because input
data are sparse, ambiguous and corrupted by noise, which in turn may result in behavioural
1
The label “Bayesian” here is a placeholder for a set of interrelated principles, methods, tools and problemsolving procedures whose hard-core is the Bayesian rule of conditionalization, which prescribes how the probability of a hypothesis should be updated based on new evidence.
1
variability, cognitive systems constantly face problems of inference under uncertainty. Unless these problems are effectively solved, cognitive systems would not be able to interact
adaptively with the world by producing reliable actions, accurate perceptions and by learning
about the surrounding environment.
If uncertainty is an ineliminable feature of cognitive systems’ interactions with the world -so
continues this argument- then the explanatory framework cognitive scientists use to understand adaptive behaviour should be able to account for how cognitive systems can effectively
deal with uncertainty. The framework should allow scientists to account for how cognitive systems make sound inferences under uncertainty. Because the Bayesian framework is the best
one for understanding how systems can effectively deal with uncertainty and make sound
inferences, this framework should be chosen for understanding at least some aspect of phenomena whose production requires cognitive systems to solve problems of inference under
uncertainty.
The argument just canvassed can be called the argument from uncertainty for Bayesian cognitive science. The first aim of this paper is to clearly reconstruct it (in Section 1). This
reconstruction will contribute to clarify the conceptual foundations of the Bayesian approach,
which is a pressing task given recent controversy about the nature, aims and scope of Bayesian
modelling in cognitive science (e.g., Bowers and Davis (2012), Chater et al. (2011), Griffiths,
Chater, Norris, and Pouget (2012), Jones and Love (2011); for recent philosophical discussions of Bayesian modelling in cognitive science see Eberhardt and Danks (2011), Colombo
and Seri`es (2012), Colombo and Hartmann (under review)).
The second aim is to assess the extent to which the argument from uncertainty justifies
many cognitive scientists’ decision to work within the Bayesian framework. In assessing this
argument, some currently underexplored alternatives to the Bayesian approach should be
taken into account as frameworks for representing and dealing with uncertainty. Once such
alternatives are taken into account, the argument from uncertainty loses much of its bite,
since it is controversial that the Bayesian framework enjoys special epistemic virtues (Section
2).
However, even if it is not taken for granted that the Bayesian approach enjoys special epistemic virtues in comparison to alternatives, there is reason to favour it in cognitive science -or
so we shall argue in the light of the results of a simple agent-based model of the distribution
of cognitive labour in science (De Langhe 2014, section 3). Compared to alternatives, the
Bayesian approach currently affords cognitive scientists with a richer body of knowledge and
tools, which have been developed in the fields of machine learning, artificial intelligence and
statistics for tackling various classes of problems of inference under uncertainty. If cognitive
systems tackle similar problems, then it can be rational for cognitive scientists to exploit
well-trodden tools and knowledge available from neighbouring fields, rather than to explore
novel or underdeveloped alternatives, in order to understand how cognitive systems handle
uncertainty when they produce certain cognitive phenomena.
Progress in cognitive science, similarly to other scientific fields, is often guided not by the
pursuit of theories of the mind and brain that enjoy special epistemic virtues, or that we
have reason to believe probably or approximately true. Rather, progress is often guided by
the pursuit of theories that can opportunistically exploit available tools and knowledge from
2
neighbouring fields. This is as it should be, as shown by the results of the agent-based model
we consider. Sometimes, it is epistemically more valuable for agents to respond to the tradeoff between exploration and exploitation by exploiting current available knowledge and tools
instead of exploring novel paths of research. This general argument -we believe- provides
a more telling justification than the argument from uncertainty for currently adopting the
Bayesian framework in cognitive science.
1
From Uncertainty to Bayesian Brains
The argument from uncertainty seeks to establish that the Bayesian framework should be
privileged for explaining many cognitive phenomena whose production requires a cognitive
system to handle uncertainty. The argument includes two steps. The first step aims to
substantiate the claim that adaptive cognitive systems must effectively deal with uncertainty.
The second step aims to establish that the Bayesian framework is the best one for explaining
how a system can effectively deal with uncertainty and make sound inferences.
1.1
Uncertainty. Underdetermination and noise
The first step in the argument from uncertainty is as follows:
P1 Cognitive systems interact adaptively with the world.
P2 If a cognitive system interacts adaptively with the world, then it must effectively deal
with uncertainty and control behavioural variability.
C Cognitive systems must effectively deal with uncertainty and control behavioural variability.
The important premise is P2, which, in one formulation or another, motivates much of
the current research in cognitive science carried out within the Bayesian framework. Here is
an overview of relevant claims in the literature.
D. Knill and Pouget (2004, 712) begin by claiming that “humans and other animals operate
in a world of sensory uncertainty”. Ma, Beck, Latham, and Pouget (2006, 1432) motivate
their study by saying that “virtually all computations performed by the nervous system are
subject to uncertainty”. Pouget, Beck, Ma, and Latham (2013, 1170) echo them by writing
that “uncertainty is an intrinsic part of neural computation, whether for sensory processing,
motor control or cognitive reasoning.” Orban and Wolpert (2011, 1) explain that “uncertainty is ubiquitous in our sensorimotor interactions, arising from factors such as sensory and
motor noise and ambiguity about the environment”. Tenenbaum et al. (2011, 1279) point
out that “we build rich causal models, make strong generalizations, and construct powerful
abstractions, whereas the input data are sparse, noisy, and ambiguous?in every way far too
limited”. Finally, Vilares and K¨
ording (2011, 22) hold that “uncertainty is relevant in most
situations in which humans need to make decisions and will thus affect the problems to be
solved by the brain”.
In the field, the term “uncertainty” is generally used broadly, to refer to the fact that a
cognitive system facing some problem lacks some relevant piece of information. This lack
of information may be due to noise -that is, to random disturbances corrupting the sensory
3
signals and processes of the system- or to the underdetermination of percepts, as well as of
other cognitive states, by input data.
Whether uncertainty is caused by noise or by the problem of underdetermination, it bears
emphasis that uncertainty goes hand in hand with behavioural variability. For example, if
you reach for an object in the darkness, your visual and motor systems will lack relevant
information about the location of the object. Your uncertainty about its location will be
reflected in a lack of accuracy in any one reaching trial. If you try to reach for that object
over and over again, a large behavioural variability should be expected over trials. Even
when a stimulus is hold as constant as possible over a number of trials, our perceptions of the
stimulus will also vary from trial to trial. For a system to have accurate perceptions and to
display reliable motor behaviour, it must find some way to tame such a variability.
The uncertainty of a cognitive system may depend on the problem of underdetermination that
it should constantly solve. Cognitive agents like humans can access the world only through
their senses, which can be viewed as sources of information about the state obtaining in the
world at any given time. If we frame this situation by using concepts from statistics, we
may refer to the states obtaining in the world with the terms “environmental parameters”,
“hidden states” or “models”, while to the sensory information received by cognitive agents
with the terms “sensory data” or “evidence”. The values of the environmental parameters
that the system must infer are underdetermined by the sensory data available to the system.
This means that, at any given time, for any sensory input to our cognitive system, there
are multiple states in the world that can fit the sensory input. Because the same sensory
input can be fit by many different environmental states, processing the sensory input alone
is not sufficient to determine which state in the world caused it. Hence, sensory inputs
underdetermine their environmental causes.
For instance, the sensory input generated by a convex object under normal lighting circumstances underdetermines its external cause. There are at least two possibilities: the object
in the world that caused the input is convex and the light comes from overhead; or the object is concave and the light comes from below. In order to perceive, and to have accurate
perceptions, our cognitive system must find some strategy to solve this underdetermination
problem.
The uncertainty of a cognitive system may be due to noise too, whose source can be internal or
external to the system. In general, noise amounts to data received but unwanted by a system.
As a noisy signal contains more data than the original signal by itself, noise modifies the signal
and extends the cognitive system’s freedom of choice in decoding it. This is an undesirable
freedom to the extent that the adaptive behaviour the system can produce requires a sufficient
degree of fidelity between original and decoded signals.
In biological agents, “noise permeates every level of the nervous system, from the perception
of sensory signals to the generation of motor responses” (Faisal, Selen, & Wolpert 2008, 292).
Specifically, three sources of noise are characteristic of biological agents. The first source of
noise lies in the thermodynamic or quantal transduction of the energy comprised by sensory
signals into electrical signals. “For example, all forms of chemical sensing (including smell
and gustation) are affected by thermodynamic noise because molecules arrive at the receptor
at random rates owing to diffusion and because receptor proteins are limited in their ability
4
to accurately count the number of signalling molecules” (D. C. Knill & Richards 1996, 4).
The second source of noise lies in certain biophysical features of ion channels, of synaptic
transmission, of network interactions and random processes governing neural activations.
These biophysical features introduce noise at the level of cellular signalling. A third source
of noise characteristic of biological agents lies in the transduction of signals carried by motor
neurons into mechanical forces in muscle fibers. This transduction introduces noise in the
signals underlying motor control, and can make motor behaviour very much variable even
in the same types of circumstances when the same goal is pursued. In order to accurately
perform motor commands, and to display reliable actions, biological agents must find some
strategy to handle the noise introduced at different level of neural processing.
If cognitive systems implement some algorithms that can solve the problem of underdetermination and mitigate the detrimental effects of noise, then they can effectively deal with
sensory and motor uncertainty so as to generate accurate perceptions and reliable action. As
the Bayesian framework provides cognitive scientists with a suite of algorithms and methods
to represent and deal with uncertainty that can solve problems of underdetermination and
mitigate detrimental effects of noise, this framework is justifiably chosen to explain at least
some central aspects of cognition.
1.2
Bayes and Uncertainty. A Natural Marriage?
The second step in the argument from uncertainty seeks to establish that the Bayesian is the
best framework for explaining how a system can effectively solve problems of inference under
uncertainty. This second step has the following form:
Given feature F , which is necessarily involved in the production of explananda P1 , . . . , Pn ,
and given candidate explanatory frameworks X1, . . . , Xn for explaining P 1, . . . , P n, infer the
explanatory superiority with respect to P 1, . . . , P n of that Xi, which is best for treating F .
An argument with this form would have us to infer the explanatory superiority of one framework Xi among several others with respect to explananda P 1, . . . , P n. The basis for drawing
this conclusion is that Xi is the best of the available competing frameworks for characterizing
and treating some feature F that is necessarily involved in the production of P 1, . . . , P n.
Here, F is the uncertainty that a cognitive system must handle when it produces adaptive
behaviour and cognitive phenomena P 1, . . . , P n. F is necessarily involved in the production
of explananda P 1, . . . , P n because, as a matter of fact, unless the system effectively deals
with uncertainty, P 1, . . . , P n cannot be produced. The explanatory framework Xi would be
the best for treating uncertainty in the sense that it would afford the best way to characterize,
represent and deal with uncertainty. By adopting framework Xi, cognitive scientists can most
fruitfully, most simply, most adequately or most generally explain how cognitive systems can
make successful inferences under uncertainty so as to solve the problem of underdetermination
and handle the detrimental effects of internal noise.
Compared to alternative frameworks, if framework Xi is the best with respect to F , in the
sense that it possesses more epistemic virtues (or epistemic virtues to a sufficiently higher
degree) that bear on explanations of F -involving phenomena, then it should be concluded
that Xi is superior to alternatives for explaining those phenomena.
5
Indeed, Bayesian decision theory has been characterized in cognitive science as the most “effective,” “congenial,” “natural” or “rational” framework to represent and deal with uncertainty
(cf. Chater et al. (2006, 287), Doya et al. (2007, xi), D. Knill and Pouget (2004, 712), Maloney (2002, 145), Mamassian, Landy, and Maloney (2002, 13), Orban and Wolpert (2011, 1),
Rescorla (in press, Section 2), Fiser, Berkes, Orban, and Lengyel (2010, 120).
To understand in which sense Bayesian decision theory is the most “congenial” framework
for representing uncertainty, its basic tenets should be brought into focus. Within Bayesian
decision theory, uncertainty is represented by probability distributions; the Bayesian rule of
conditionalization specifies how a probability distribution should be updated in the light of
new information. Within this framework, a cognitive system is seen as entertaining “beliefs” drawn from a hypothesis space H. “Beliefs” are about what in the world could have
caused the current input e to the system. Each “belief” is associated with a prior probability
P rob(h), which represents the weight borne by the belief that h on the processes carried out
by the system. At any given time, the system’s “beliefs” satisfy the axioms of the probability
calculus. Probabilities are also assigned to (e, h) pairs, in the form of a generative model that
specifies a joint probability distribution over inputs and hypotheses about states in the world
generating those inputs. Generative models model likelihoods that represent how probable
it is that the system would receive the current input e, given a hypothesized state h in the
world, viz. P rob(e|h). Given a generative model, current input e and the prior knowledge associated with P rob(h), the system computes the posterior conditional probability P rob(h|e),
thereby reallocating probabilities across the hypothesis space in accord with the Bayesian rule
of conditionalization.
Conditionalization governs how the system ought to update “beliefs” upon receiving new
information, but it does not specify how the beliefs entertained by the system should be used
to produce a decision, an action or some other phenomenon. How to use the posterior to
produce a decision, an action or some other phenomenon requires the definition of a loss
function, which specifies the relative cost of making a certain decision based on a certain
belief. To determine the best possible decision available at a given time, the system needs to
compute the estimated loss for any given decision and belief.
With this outline in place of the basics of Bayesian decision theory, let us now re-examine
the problem of underdetermination by considering the case of visual perception. The input
data to the visual system consist of the image that arrives to the retina. The “beliefs” (or
hypotheses) entertained by the perceptual system are about states in the world that could
have given rise to that image. Based solely on input data, the system cannot determine which
state gave rise to the retinal image, as any patch of retinal stimulation could correspond to
an object of any size and almost any shape. However, if the system deploys knowledge about
which size and shape are more likely a priori, it can determine which state or object would
be most likely to produce the retinal input data. By applying the rule of conditionalization
so as to combine prior knowledge with the likelihood of the state in the world giving rise to
input data, the system can find a solution to the problem of underdetermination.
What about noise? Prior information embodied in neurons’ receptive fields -viz. in the
portion of sensory space that can elicit neural responses when stimulated- can be used and
processed in a Bayesian fashion also to handle the effects of noise. The basic strategy is
simple: “If the structure of the signal and/or noise is known it can be used to distinguish
6
signal from noise” (Faisal et al. 2008, 298). And distinguishing signal from noise is essential
to producing reliable cognitive and behavioural phenomena. In other words, neurons’ prior
knowledge about the expected statistical structure or noise of a signal for any given source
of information allows the cognitive system to compensate for noise and to give more weight
to more reliable (less noisy) signals in its processes. So, the human cognitive system can rely
on prior knowledge about the expected structure of its inputs and combine it with incoming
data by applying Bayesian conditionalization in order to deal with noise and produce reliable
perceptions and motor behaviour.
While there is no doubt that the Bayesian framework can be adopted to explain how cognitive
systems can deal with uncertainty in producing certain phenomena, it remains unclear why
this framework should be privileged over alternative frameworks. In current cognitive science,
it is oft assumed that the Bayesian framework should be privileged because it obviously enjoys
more epistemic virtues (or epistemic virtues to a larger degree) than alternatives. It would
provide the most unified, precise, fruitful and “rational” explanatory framework for many sets
of empirical data about cognitive phenomena.
Bayesian decision theory would be unifying, as it offers a common, encompassing and flexible mathematical framework for studying a range of diverse phenomena (cf. D. Knill and
Pouget (2004), Griffiths, Chater, Kemp, Perfors, and Tenenbaum (2010); see also Colombo
and Hartmann (under review). It is precise insofar as it provides scientists with a rigorous
and quantitative formalism that can be used to “precisely relating what one set of information tells us about another” (Chater et al. 2006, 287). The Bayesian framework would be
fruitful insofar as it helps cognitive scientists to discover which algorithms can be tractably
implemented by the mechanisms producing those phenomena ((Griffiths et al. 2012, 417)).
The exploration of such algorithms, in turn, may have underappreciated consequences for our
understanding of the nature and relation of perception and action (cf.Clark (2013)). Finally,
Bayesian decision theory would be the most rational framework as it sets a normative standard
concerning how rational agents should combine and weigh different beliefs, how they should
update their beliefs upon receiving novel information and how they should make decision under uncertainty (cf.Doya et al. (2007), Griffiths et al. (2010); see also Bovens and Hartmann
(2003). The normative force of such a standard relies on the idea that an agent’s degrees
of belief should at least obey the probability calculus, which in turn is typically justified by
appealing to (diachronic and synchronic) Dutch book arguments or to Cox (1946)’s theorem.
While Dutch book arguments purport to establish that it is epistemically irrational for an
agent to have degrees of belief that violate the rules of probabilistic calculus (cf. Pouget et
al. (2013), Vineberg (2014)), Cox’s theorem would show that any rational measure of belief
is isomorphic to a probability measure.
Leveraging these alleged virtues, many cognitive scientists’ choice would then rest justified to
embrace Bayesian decision theory as the “most effective,” “congenial” or “natural” framework
for explaining how cognitive systems represent and traffic with uncertainty.
2
Trafficking with Uncertainty. A zoo of approaches
There are two troubles for the cognitive scientists who justify their choice to work within the
Bayesian framework by merely appealing to the argument from uncertainty. First trouble:
7
currently, there are several overlooked alternative frameworks, whereby one can represent
uncertainty and solve problems of inference under uncertainty. Second trouble: it is not
obvious that Bayesian decision theory offers the best framework for representing and dealing
with uncertainty.
If some explanatory framework alternative to the Bayesian one is currently available but overlooked by cognitive scientists, then it cannot be simply claimed that cognitive systems and
the uncertainty-involving phenomena they produce are best explained within the Bayesian
framework. The mere existence of underappreciated alternatives weakens the strength of the
argument from uncertainty for Bayesian cognitive science. Furthermore, it is not obvious
that Bayesian decision theory provides cognitive scientists with the best framework for finding explanations of phenomena that involve uncertainty; and no cognitive scientist has put
forward an argument that this is the case. The Bayesian approach in cognitive science might
have gained unjustified plausibility by shielding itself from relevant, but under-considered,
alternative frameworks.
Here we point to four frameworks for representing uncertainty alternative to Bayesian decision
theory, namely: Dempster-Shafer theory, possibility theory, ranking theory and quantum
probability theory.2 Before sketching the basic tenets of these four theories, it should be
emphasized that we do not intend to offer a thorough comparative review (see e.g. Huber
(2014), Halpern (2003) for more extensive reviews). For our purposes, suffices it to make some
remarks aimed at driving home the weaker point that it is not obvious that Bayesian decision
theory offers cognitive scientists the most unified, precise, fruitful and rational approach to
uncertainty-involving phenomena.
2.1
The Dempster-Shafer framework
The Dempster-Shafer approach to uncertainty can be considered as a generalization of the
Bayesian approach, where probabilities are assigned to sets instead of to single events and
Dempster’s rule is used for aggregating information instead of Bayesian conditionalization
(Shafer (1992)). There are three functions in this framework allowing to represent uncertainty:
the basic probability assignment function, the belief function and the plausibility function.
Given some set of states, the basic probability assignment (bpa) function defines a mapping
of the power set to the interval between 0 and 1, where the bpa of the null set is 0 and the
summation of the bpa’s of all the subsets of the power set is 1. The belief function and the
plausibility function define respectively a lower and an upper bound on intervals representing
beliefs about (sets of) states. The lower bound Belief for a set of interest is defined as the sum
of all the bpa’s of the proper subsets of the set of interest. The upper bound, Plausibility,
is the sum of all the bpa’s of the sets that intersect the set of interest. The core rule for
aggregating several bpa’s associated to information from multiple, independent sources is
Dempster’s rule. This rule corresponds to a normalized conjunctive operation according to
which information should be combined by favouring the agreement between the sources and
ignoring all the conflicting evidence (see Dempster (1968) for details).
2
While each one of these frameworks provides explicit representations of uncertainty, there are implicit, nonprobabilistic approaches to uncertainty too (see e.g. Simoncelli (2009), Drugowitsch and Pouget (2012)). By
expanding the set of alternatives to the Bayesian framework, the case against the argument from uncertainty
for Bayesian cognitive science should become even more persuasive.
8
2.2
The possibility framework
Within the possibility approach to uncertainty, possibility measures are based on ideas from
fuzzy logic, and put into focus the notion of imprecision due to vagueness. In this framework,
possibility distributions P oss represent “the knowledge of an agent (about the actual state of
affairs) distinguishing what is plausible from what is less plausible, what is the normal course
of things from what is not, what is surprising from what is expected” (Dubois and Prade
(2007). One way possibility theory differs from the Bayesian approach is by the use of a pair
of dual set-functions (i.e., possibility and necessity measures) instead of only one, which make
it easier to capture partial ignorance.
Formally, one prominent difference between the Bayesian approach and the possibility one is
that the former is characterized by an additivity property, while the latter is characterized
on a “maxitivity” property. According to this property if U and V are disjoint sets, then
P oss(U ∪ V ) = max(P oss(U ), P oss(V )). When P oss(U ) >0 and the set V is non-empty,
information can be aggregated as: P oss(V |U ) = 1, if P oss(U ∩ V ) = P oss(U )P oss(V |U ) =
P oss(U ∩ V ), otherwise. The difference between this rule and Bayesian conditionalization
is that here the renormalisation via division is changed into a shift to 1 of the “possibility”
values of the most possible elements in U (see Dubois and Prade (2007) for details).
2.3
The ranking framework
In ranking theory, a ranking function κ represents degrees of disbelief (or surprise) on an
integer scale (Spohn 2009). A proposition A is disbelieved just in case its rank is positive,
κ(A)>0. Accordingly, tautological propositions should not be disbelieved. Propositions not
disbelieved at all are ranked 0. Propositions ranked 0 are “unsurprising.” But this does
not mean that propositions ranked 0 are necessarily believed. A proposition A is believed
just in case its negation is disbelieved, κ(¬A)>0. Disbelieved propositions are ranked with
greater and greater degrees, up to ∞. Thus, higher ranks correspond to higher degrees
of surprise. Accordingly, contradictory propositions should be disbelieved to the highest
degree. Conditional ranks are defined as differences of unconditional ranks as: κ(A|B) =
κ(A ∩ B) − κ(B). Using conditional ranks, the main rules for updating and aggregating ranks
correspond to Bayesian conditionalization.
The relation between ranking and Bayesian theory is complex and subtle. Perhaps the major
difference is that ranking theory is focused on the everyday, categorical notion of belief that
can be truth or false, instead of the quantitative notion of degree of belief that is captured
by the Bayesian framework. Thus ranking theory uses numbers, or ranks, to address several
traditional philosophical puzzles centered around the everyday notion of belief. On these
grounds it has been claimed that the ranking theoretic approach has some advantages over
probabilistic approaches: ranking theory would allow us to do almost everything that we can
do with probabilistic measures and also to tackle traditional problems in epistemology (cf.
Huber (2014, section 3.3), Spohn (2009, section 3-4).
2.4
The quantum probability framework
Quantum probability theory is a geometric approach to probability, where different outcomes
are represented as subspaces of varying dimensionality in a multidimensional Hilbert space,
9
which is a vector space used to represent all possible outcomes for questions we could ask
about a system. Unit vectors correspond to possible states of the system, and embody current
knowledge about the system under consideration. Probabilities of outcomes are determined
by projecting the state vector onto different sub-spaces and computing the squared length
of the projection. The determination of probabilities is context- and order-dependent, as
individual states can be superposition states and composite systems can be entangled. Thus,
while in the Bayesian framework Prob (A&B) = Prob (B&A), in quantum probability theory
commutativity in conjunction does not always hold (see R´edei and Summers (2007) for an
introduction to the theory).
The motivation for adopting the quantum probability framework in cognitive science is that
core properties of this framework, such as incompatibility, superposition and entanglement,
would allow cognitive scientists to accurately account for many cognitive processes and experimental results that are not obviously captured within the Bayesian framework (Pothos &
Busemeyer 2013). For example, incompatibility in quantum probability theory entails that it
is impossible to concurrently assign a truth-value to two hypotheses. Psychologically, the two
hypotheses can be processed only serially, as processing of one hypothesis interferes with the
other. Given the hypotheses A and B, if A is true at a certain time, then B can be neither
true nor false at that time. Conjunctions between incompatible hypotheses are then defined
in a sequential way as “A and then B” (see Busemeyer and Bruza (2012) for details).
2.5
Which one is the best? Some cursory considerations
Each one of the approaches just sketched has both epistemic virtues and vices over the
Bayesian one. Dempster-Shafer theory can be considered as a generalization of the Bayesian
approach, where uncertainty deriving from ignorance is naturally represented by vacuous belief functions, and evidence is combined by Dempster’s rule of combination without requiring
strong independence assumptions. Partial and total ignorance can be represented without the
need to specify a prior: Your initial degrees of belief should be vacuous, viz. zero everywhere
but for tautological propositions. At any later time, your degrees of belief should be the
result of combining the vacuous belief function with your total evidence. Belief-states and
evidence are represented by the same types of mathematical objects, viz. belief functions.
The Dempster-Shafer approach might then be considered as a more unifying framework than
the Bayesian one. However, inference within this framework is significantly less computationally efficient than Bayesian inference. This inefficiency depends on the fact that within
the Dempster-Shafer framework evidence is represented by a belief function that is induced
by a probability measure on the power-set of possible outcomes of a question, instead of by
a probability measure on the set of possible outcomes. Hence, the amount of computation
required for the combination of evidence by Dempster’s rule increases exponentially with the
cardinality of the set of possible outcomes.
Possibility theory can be seen as a simpler methodology to inference under uncertainty, where
uncertainty corresponds to imprecise or ambiguous information that is void of randomness.
The possibility framework has a computational advantage over probability as “maxitivity”
makes possibility measures compositional -viz. P oss(U ∪ V ) is determined by P oss(A) and
P oss(B): it is the maximum of the two. Instead, all that can be said about P rob(A∪B) is that
it is at least max(P rob(A), P rob(B)) and at most min(P rob(A) + P rob(B)). However, since
10
fuzzy approaches to uncertainty such as possibility theory are not isomorphic to probability
theory, it can be suggested that Cox’s theorem rules out possibility theory as a rational means
of quantifying uncertainty (Lindley (1982); but see Colyvan (2008)).
Ranking theory provides a link between quantitative (focused on degrees of belief) and qualitative (focused on the categorical notion of yes-or-no belief) approaches to the representation
of uncertainty and belief dynamics. It can be used ingeniously to address general questions of
philosophical interest. But it does not obviously have a bearing on actual scientific practice,
since it is far from being clear how one should use ranking functions with noisy sample data
for making sound inferences about any concrete system or process.
Finally, quantum probability theory rests on normatively dubious grounds, as it is based on a
set of axioms that allows for an agent to be Dutch-booked. While quantum probability theory
“is perhaps a framework for bounded rationality . . . and not as rational as in principle possible”
(Pothos & Busemeyer 2013, 2), courtesy of its unique properties, including superposition,
entanglement, incompatibility, and interference, it claims to accommodate empirical results
related to order/context effects that are not easily captured within a Bayesian framework. If
this is so, for some phenomena the Bayesian framework is less empirically adequate than the
quantum probabilistic one.
While these observations suggest that it is problematic to hold that the Bayesian approach
is the best for dealing with uncertainty, virtually all work in Bayesian cognitive science has
proceeded by neglecting current alternative frameworks, taking for granted the superiority
of the Bayesian one, or assuming that the Bayesian approach is the only game in town. If
the argument from uncertainty is used to justify the Bayesian approach in cognitive science,
then available alternatives should not be ignored. Unless the relative epistemic virtues of the
Bayesian framework are actually probed against virtues and disadvantages of these possible
competitors on actual case-studies, the argument from uncertainty alone does not justify
many cognitive scientists’ choice to work within the Bayesian framework. For it needs to
be established that it is actually the “most effective,” “most congenial” or “most rational”
framework to understand many phenomena produced by cognitive systems that must handle
uncertainty.
We believe there is a better argument to justify many cognitive scientists’ decision to go
Bayesian. This argument can be called the argument from specialization and can be established through the results of a simple agent-based model of the distribution of cognitive labour
in science. The next and final section aims to make this argument clear.
3
An argument from specialization for Bayesian cognitive science
Currently, there is little doubt that the most common approach to represent and deal with
uncertainty is the Bayesian one (Halpern 2003, 4). The tools which a Bayesian cognitive
scientist can currently use to address problems of uncertain inference are more sophisticated
than alternatives, being routinely used in neighbouring fields like machine learning, artificial intelligence, and statistics. In comparison to Dempster-Shafer theory, possibility theory,
ranking theory and quantum probability theory, the Bayesian approach is more widespread
11
in each of a wide variety of fields ranging from statistics to machine learning and AI (Poirier
2006). And the popularity of Bayesian modelling has been growing in cognitive science too,
as evidenced by an increase in the number of articles, conference papers and workshops dedicated to Bayesian modelling of cognition and its foundations (Kwisthout, Wareham, & van
Rooij 2011, note 1).
Given this popularity, and given that it is not obvious that the Bayesian framework is the best
one for representing uncertainty, it is plausible to explain many cognitive scientists’ choice
to carry out their research within the Bayesian framework in terms some non-epistemic,
sociological factors. These sociological factors may have led more and more scientists to
approach research questions within the Bayesian framework, while neglecting some of the
alternative frameworks available for dealing with uncertainty.
As more and more cognitive scientists have addressed research questions within the Bayesian
framework, a division of cognitive labour has been fostered in the field. Sophisticated tools
have been developed, which have been exploited to approach problems at a higher level of
specialization. Under certain conditions, this extensive exploitation is the rational thing to
do for scientists, since specialization can promote scientific progress. If this is correct, then
the value of specialization within the social structure of current cognitive science offers more
solid grounds for the choice to currently work within the Bayesian framework.
What follows will substantiate this idea by using a simple but general agent-based model of
the distribution of cognitive labour. We believe that this model picks out only those features
that are essential to any scientific framework. If this is correct, then its results will be robust
and applicable to the case of cognitive science. With this model in hand, it will be shown
under what circumstances exploiting a well-developed existing framework, instead of exploring
under-considered alternatives, is the rational thing to do for scientists.
3.1
Trading-off specialization and innovation. An agent-based model
In introducing our model, the first thing to note is that a framework is not a statement that can
be true or false, but a standard to generate and assess such statements. For example, Bayesian
decision theory affords cognitive scientists with a “unifying mathematical language for framing
cognition as the solution to inductive problems”(Tenenbaum et al. 2011, 1285) This way of
framing cognition allows scientists to generate and assess predictions and explanations of
several cognitive phenomena and behaviour (cf. Colombo and Seri`es (2012)).
What is particular about standards is that a standard enables coordination. As such the value
of standards depends not solely on their value when used by individuals in isolation, viz. their
intrinsic value, but also on their value when used by multiple individuals simultaneously. As
a framework in cognitive science, the value of Bayesian decision theory does not lie only in
its intrinsic virtues that allow scientists to represent and handle uncertainty “naturally” or
“congenially”. Its value depends also on its power to facilitate scientific coordination.
One way to illustrate this point is by considering that agents typically have strong preferences
for one standard over another even in the absence of differences in intrinsic epistemic value.
For example, right-hand and left-hand driving are alternative standards for traffic. Although
there is no difference in the intrinsic value of both options, society is not indifferent to the side
12
people drive their cars on. Rather, in the absence of differences in intrinsic value it is clear
that coordination is the remaining criterion to evaluate the desirability of both alternatives.
The same goes for the adoption of scientific frameworks such as the Bayesian one, viz.: their
value depends both on their intrinsic epistemic value and on their success at facilitating
coordination.
In science, successful coordination allows scientists to divide labour and specialize (Kuhn
(1970), Kitcher (1990), Wray (2011, chapter 7), De Langhe (2010)). Successful coordination
on a joint standard means scientists can spend less time developing the framework itself and
offers them more time to actually use it to gain knowledge and solve problems. This increased
productivity in the short term comes at a cost in the long term. Less time spent on critical
evaluation of the current framework and the formulation and exploration of novel frameworks
with potentially superior intrinsic epistemic values entails a reduced ability to adapt to newly
gathered knowledge, and a higher probability of lock-in to a suboptimal standard (cf. Arthur
(1989)). In sum, scientists adopting a framework face a trade-off between the conflicting
demands of specialization and innovation. For example, cognitive scientists adopting the
Bayesian framework instead of the quantum probabilistic approach would face a trade-off
between specialization and innovation, between exploiting well-trodden tools and knowledge,
and exploring less developed ones.
The source of this dilemma is that exploring new frameworks and exploiting a given framework
are two mutually exclusive activities because they take place at different levels: within and
between frameworks. Specialization corresponds to taking the framework for granted and
focusing on using it to achieve results. The amount of specialization that a framework allows
is a measure for comparison between frameworks and is a function of the number of adopters
(with whom labour can be divided and productivity can be increased) in comparison to other
frameworks. Innovation is a measure for comparison of contributions within a framework. The
innovativeness of a contribution to a framework depends on the number of other contributions
already made to that framework.
A contribution exploiting a popular framework might not be very innovative from the point
of view of making progress towards improving or replacing that framework, but it does contribute to its exploitation and as such allows other scientists to specialize in other aspects
of that framework’s exploitation. Conversely, a contribution to a novel framework might not
foster immediate specialization but pave the way for a future, more efficient standard for
specialization. As such we conceive of the utility of a scientific contribution as the product of
these two fundamentally different but essential factors.
Put this point formally, consider a community of N (1, . . . , n) scientists. Each time, each
scientist makes a contribution C(c1 , . . . , cN ) to a framework S(s1 , . . . , sM ). Exploitation consists in making a contribution to an existing framework. The more scientists exploit the same
framework, the higher the benefit of specialization becomes because scientists can specialize
in narrower sub-problems, and specialized tools can be developed, which will foster scientific
progress.3 As a consequence, a local proxy for the benefits of exploitation is the number of
3
The insight that division of labour increases productivity by fostering specialization is as old as Adam
Smith (2003) and marked the birth of modern economics. For an application of this model to the literature
on the division of cognitive labour, see De Langhe (2014).
13
adopters of a framework. More precisely, the “adoption” A of a framework s is the sum of
the number of scientists that contribute to it.
As (t) =
X
ai,s (t).
(1)
Exploration consists in an allocation of scientific labour to a new framework. The less articulated a framework, the higher the innovative value of contributing to that framework. If it is
assumed that each scientist makes one contribution at each time, then a local proxy for the
benefits of exploration is the inverse of the number of contributions made to a framework at
that time. More precisely, the “production” P as the sum of contributions to a framework s
is the sum of adopters through time t:
Z t
As (t0 )dt0 .
(2)
Ps (t) =
0
The utility U of a framework s is jointly determined by exploration and exploitation:
Us (t) =
Aαs (t)
.
Ps (t)
(3)
The parameter alpha denotes the output elasticity of coordination, which is a function of
the (for the purpose of this paper exogenous) state of the tools and epistemic technologies
underlying a scientific community (e.g., textbooks, focused workshops, conferences, standard
methodologies, well-understood formalisms, specialized tools for data analysis, etc). Us is
backward-looking because it evaluates the utility of the last contribution to a framework.
However, individual agents do not take into account the utility of the last contribution to a
framework, but the utility that their own contribution would have if they made it to that
framework.4 Hence the utility of a contribution to a framework is:
Us0 (t) =
(As (t) + 1)α
.
Ps (t) + 1
(4)
The relation between adoption and production, as specified here dynamically, is an attempt
to capture the essence of the trade-off between exploitation and exploration. More exploitation means less exploration, and similarly the number of adopters to a framework increases
the specialization benefits from exploiting it but decreases the novelty of exploring it. The
result is a constant tension for each agent between exploitation and exploration, where more
exploration causes exploitation to become more attractive and vice versa. On the one hand,
convergence of scientists on a shared framework is a good thing because of specialization effects. This is modelled by letting the utility of a contribution to a framework increase as more
others adopt that framework. On the other hand, it is also important that new frameworks
can be developed. This is modelled by letting the utility of a contribution to a framework
4
If scientists always chose to develop that framework which is best articulated, then no alternative frameworks would ever be developed because any new framework would always be less articulated than the existing
ones. Scientists can only be expected to develop new frameworks if their focus is not backward-looking, but
forward-looking.
14
decrease as more contributions to that framework have been made. In sum, the utility of a
contribution to a framework varies with adoption and varies inversely with production.
Although our model allows different frameworks to be assigned different intrinsic values, in
line with our assessment of the argument from uncertainty above, we shall assume that the
intrinsic value of each framework is the same. This is to isolate the effects of the essential
dynamics sketched above and to provide a clean answer to the questions that concern us
here: Under what conditions is exploitation of an existing framework like the Bayesian one
the rational decision to make? And what ratio is superior of exploration and exploitation,
ceteris paribus?
The dynamics of the model are as follows. Each time each agent makes a contribution to a
framework with the probability of contributing to a specific framework proportional to the
utility of the next contribution to that framework. This not only assigns a likelihood of
adoption to each existing framework but also introduces a non-zero utility to the creation of
a new framework. A new framework has no adoption (A = 0) and no production (P = 0),
resulting in a fixed utility of 1 regardless of α. The probability that a scientist creates a
new framework will therefore vary inversely proportional to the sum of the utility of all
frameworks already being adopted. The lower the utility of the existing frameworks, the
higher the probability a new one is created, thus self-regulating the system into a balance
between the exploitation of existing frameworks and the exploration of new frameworks. The
number of frameworks in the model is not given, but a function of the organization of labour
in the model itself.
Now that utilities have been assigned, it is possible to evaluate alternative strategies for trading off exploration and exploitation based on the total utility they produce for the community
as a whole. For example, it is possible to assess how the cognitive science community should
trade off the value of exploiting a well-articulated, widespread framework, and of exploring
some novel, under-considered alternative.
Fig.1 shows for alpha = 2.5 that the total utility in a system with an adaptive ratio is greater
than the total utility created by any fixed proportion of explorers from 0 to 100%. A system
without explorers (0%) does well initially but is unsustainable in the long term. In our case,
if no cognitive scientist explores novel frameworks for representing uncertainty alternative to
Bayesian decision theory, then acquisition of valuable knowledge in cognitive science will be
hindered in the long run. A system consisting only of explorers (100%) can keep on growing
due to continuous innovation, but with as many frameworks as there are scientists it cannot
reap the benefits of specialization. In our case, if the cognitive science community were very
much fragmented with individual scientists or labs working within different frameworks, then
there would be no opportunity at all to advance the fields through specialization.
Combinations of both do better, with the optimal fixed strategy around 30% exploration. Yet
the most important result is that an adaptive strategy, whereby the ratio of exploration and
exploitation varies with the utility of the available alternatives, is superior. This is because it
not only combines the benefits of specialization deriving from exploitation with the benefits
of innovation deriving from exploration, but has the ability to adaptively shift between them
as the circumstances require it. In relation to our case, if the majority of cognitive scientists
worked within the Bayesian framework, while a sizable minority explored under-considered
15
Figure 1: Comparison of total utility with an adaptive vs. fixed ratio of explorers for 1,000 agents
over 100 steps with α = 2.5.
16
Figure 2: Total utility as a function of α after 1000 steps for 1,000 agents.
alternatives, then advancements in scientific knowledge of at least some cognitive phenomena
and behaviours would be most likely to result.
The superiority of the adaptive strategy is however not universal. Robustness analysis in fig.2
shows that the adaptive strategy is only superior if there are sufficient benefits to dividing
labour, viz. if alpha is sufficiently large. If alpha is too small, communities derive greater
benefit from full exploration than from an adaptive strategy. This possibly would explain
why branches of cognitive science that have not been able (yet) to develop the epistemic
technology and tools to make a division of labour possible or worthwhile (e.g., textbooks,
workshops, conferences, standard methodologies, formalisms, specialized tools) have a substantially different disciplinary structure from those that have. For all systems where alpha
is sufficiently high, this model shows that well-timed intermittent phases of exploitation and
exploration are preferable over fixed strategies.
In conclusion, two morals relevant to the case of Bayesian cognitive science can be drawn
from our model. First, the (successive) monopoly of a single framework is preferable over
pluralism in situations where the intrinsic value of different frameworks is comparable or
unknown. Second, a fixed exploration ratio of around 30% is superior to other fixed ratio’s,
but inferior to a dynamic ratio by which exploration of new frameworks increases with the
number of contributions to the monopolistic framework and that framework is gradually
depleted. So, currently, the monopoly of the Bayesian approach to uncertainty in cognitive
science may well be justified because of the value of specialization it promotes. However, in
the longer run, progress will require a mixed strategy of continuing to exploit the Bayesian
framework, while investing more time and attention towards new or underappreciated ones
to explore.
17
4
Conclusion
In several branches of current cognitive science, it is widely assumed that the Bayesian framework should be chosen for finding and assessing explanations of cognitive phenomena whose
production involves uncertainty. However, as we have argued in this paper, this assumption
is far from being unproblematic, since it is not obvious that the Bayesian framework enjoys
special epistemic virtues over available but under-considered alternatives for representing uncertainty. A better justification for adopting the Bayesian framework in cognitive science
is that currently it comprises a richer body of tools and epistemic technologies that can
be opportunistically exploited so as to foster specialization. While the value of specialization
trades off with the value of innovation, specialization is often the best way to achieve scientific
progress.
Thus, the present paper has made two contributions to existing literature in philosophy and
cognitive science. First, it has critically reconstructed the argument from uncertainty for
Bayesian cognitive science, arguing that it does not provide cognitive scientists with strong
reason to favour the Bayesian approach over alternatives. Second, by relying on a simple
model of the division of cognitive labour in science, the paper has put forward a novel argument based on the value of specialization in support of the Bayesian approach in current
cognitive science.
References
Arthur, B. (1989). Competing technologies, increasing returns, and lock-in by historical
events. Economic Journal (394), 116-131.
Bovens, L., & Hartmann, S. (2003). Bayesian epistemology. Oxford: Clarendon Press.
Bowers, J., & Davis, C. (2012). Bayesian just-so stories in psychology and neuroscience.
Psychological Bulletin, 138 , 389-414.
Busemeyer, J., & Bruza, P. (2012). Quantum models of cognition and decision. Cambridge:
Cambridge University Press.
Chater, N., Goodman, N. D., Griffiths, T. L., Kemp, C., Oaksford, M., & Tenenbaum, J. B.
(2011). The imaginary fundamentalists: The unshocking truth about bayesian cognitive
science. Behavioral and Brain Sciences, 34 , 194-196.
Chater, N., Tenenbaum, J. B., & Yuille, A. (2006). Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences, 10 (7), 287-291.
Clark, A. (2013). Whatever next? predictive brains, situated agents, and the future of
cognitive science. Behavioral and Brain Sciences, 36 (3), 181-204.
Colombo, M., & Hartmann, S. (under review). Bayesian cognitive science, unification, and
explanation. The British Journal for Philosophy of Science.
Colombo, M., & Seri`es, P. (2012). Bayes in the brain. on bayesian modelling in neuroscience.
The British Journal for Philosophy of Science, 63 (3), 697-723.
Colyvan, M. (2008). Is probability the only coherent approach to uncertainty? Risk Analysis,
28 (3), 645-652.
Cox, R. (1946). Probability, frequency, and reasonable expectation. American Journal of
Physics(14), 1-13.
De Langhe, R. (2010). The division of labour in science: the tradeoff between specialisation
and diversity. Journal of Economic Methodology, 17 (1), 37-51.
18
De Langhe, R. (2014). A unified model of the division of cognitive labor. Philosophy of
Science, 81 (3), 444-459.
Dempster, A. (1968). A generalization of bayesian inference. Journal of the Royal Statistical
Society, Series B, Methodological , 30 , 205-247.
Doya, K., Ishii, S., Pouget, A., & Rao, R. (2007). Bayesian brain: probabilistic approaches
to neural coding. Cambridge, MA: MIT Press.
Drugowitsch, J., & Pouget, A. (2012). Probabilistic vs. non-probabilistic approaches to the
neurobiology of perceptual decision-making. Current opinion in neurobiology, 22 (6),
963-969.
Dubois, D., & Prade, H. (2007). Possibility theory. Scholarpedia, 2 (10), 2074.
Eberhardt, F., & Danks, D. (2011). Confirmation in the cognitive sciences: The problematic
case of bayesian models. Minds and Machines, 21 (3), 389-410.
Faisal, A. A., Selen, L. P. J., & Wolpert, D. M. (2008). Noise in the nervous system. Nature
Reviews Neuroscience, 9 , 292-303.
Fiser, J., Berkes, P., Orban, G., & Lengyel, M. (2010). Statistically optimal perception and
learning: from behavior to neural representations. Trends in Cognitive Sciences, 14 (3),
119-130.
Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic
models of cognition: exploring representations and inductive biases. Trends in cognitive
sciences, 14 (8), 357-364.
Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012). How the bayesians got their
beliefs (and what those beliefs actually are). Psychological Bulletin, 138 , 415-422.
Halpern, J. (2003). Reasoning about uncertainty. Cambridge, MA: MIT Press.
Huber, F. (2014). Formal representations of belief. The Stanford Encyclopedia of Philosophy (Spring 2014 Edition), http://plato.stanford.edu/archives/spr2014/entries/formalbelief/.
Jones, M., & Love, B. C. (2011). Bayesian fundamentalism or enlightenment? on the explanatory status and theoretical contributions of bayesian models of cognition. Behavioral
and Brain Sciences, 34 , 169-188.
Kitcher, P. (1990). The division of cognitive labor. Journal of Philosophy, 87 (1), 5-22.
Knill, D., & Pouget, A. (2004). The bayesian brain: the role of uncertainty in neural coding
and computation. Trends in Neurosciences, 27 , 712-719.
Knill, D. C., & Richards, W. E. (1996). Perception as bayesian inference. New York:
Cambridge University Press.
K¨ording, K. (2007). Decision theory: what ”should” the nervous system do? Science, 318 ,
606-610.
Kuhn, T. (1970). The structure of scientific revolutions, 2nd ed. Chicago: Chicago University
Press.
Kwisthout, J., Wareham, T., & van Rooij, I. (2011). Bayesian intractability is not an ailment
that approximation can cure. Cognitive Science, 35 (5), 779-784.
Lindley, D. V. (1982). Scoring rules and the inevitability of probability. International
Statistical Review , 50 , 1-26.
Ma, W., Beck, J., Latham, P., & Pouget, A. (2006). Bayesian inference with probabilistic
population codes. Nature Neuroscience, 9 , 1432-1438.
Maloney, L. T. (2002). Statistical decision theory and biological vision. In D. H. . R. Mausfeld (Ed.), Perception and the physical world: Psychological and philosophical issues in
perception.
19
Mamassian, P., Landy, M. S., & Maloney, L. T. (2002). Bayesian modeling of visual perception. In R. Rao, M. Lewicki, & B. Olshausen (Eds.), Probabilistic models of the brain;
perception and neural function.
Orban, G., & Wolpert, D. (2011). Representations of uncertainty in sensorimotor control.
Current Opinions in Neurobiology, 21 , 1-7.
Poirier, D. J. (2006). The growth of bayesian methods in statistics and economics since 1970.
Bayesian Analysis, 1 , 969-980.
Pothos, E. M., & Busemeyer, J. R. (2013). Can quantum probability provide a new direction
for cognitive modeling? Behavioral and Brain Sciences, 36 , 255-327.
Pouget, A., Beck, J., Ma, W., & Latham, P. (2013). Probabilistic brains: knowns and
unknowns. Nature Neuroscience, 16 , 1170-1178.
Rao, R., Olshausen, B., & Lewicki, M. (2002). Probabilistic models of the brain: perception
and neural function. Cambridge, MA: MIT Press.
R´edei, M., & Summers, S. J. (2007). Quantum probability theory. Studies in the History and
Philosophy of Modern Physics, 38 , 390-417.
Rescorla, M. (in press). Bayesian perceptual psychology. In M. Matthen (Ed.), The oxford
handbook of the philosophy of perception.
Shafer, G. (1992). The dempster-shafer theory. In S. C. Shapiro (Ed.), Encyclopedia of
artificial intelligence, second edition.
Simoncelli, E. P. (2009). Optimal estimation in sensory systems. In M. S. Gazzaniga (Ed.),
The cognitive neurosciences, iv.
Smith, A. (2003). Wealth of nations. New York: Bantam Classics. ([1776])
Spohn, W. (2009). A survey of ranking theory. In F. Huber & C. Schmidt-Petri (Eds.),
Degrees of belief.
Tenenbaum, J., Kemp, C., Griffiths, T., & Goodman, N. (2011). How to grow a mind:
statistics, structure and abstraction. Science, 331 , 1279-1285.
Vilares, I., & K¨
ording, K. (2011). Bayesian models: the structure of the world, uncertainty,
behavior, and the brain. Annals of the New York Academy of Sciences, 1224 , 22-39.
Vineberg, S. (2014). Dutch book arguments. The Stanford Encyclopedia of Philosophy,
http://plato.stanford.edu/archives/sum2011/entries/dutch-book.
Wray, K. B. (2011). Kuhn’s evolutionary social epistemology. Cambridge: Cambridge University Press.
20