How to build towers of arbitrary heights, and other hard problems

How to build towers of arbitrary heights,
and other hard problems
Debora Field1 and Allan Ramsay2
Abstract. This paper presents a reasoning-centred approach to
planning: an AI planner that can achieve blocks-world goals of the
form ‘above(X,Y)’ by knowing that ‘above’ is the transitive closure
of ‘on’, and using that knowledge to reason over the effects of actions. The fully implemented model can achieve goals that do not
match action effects, but that are rather entailed by them, which it
does by reasoning about how to act: state-space planning is interwoven with theorem proving in such a way that a theorem prover uses
the effects of actions as hypotheses. The planner was designed for
the purpose of planning communicative actions, whose effects are
famously unknowable and unobservable by the doer/speaker, and depend on the beliefs of and inferences made by the recipient/hearer.
1
INTRODUCTION
The approach of many of today’s top domain-independent planners is
to use heuristics, either to constrain the generation of a search space,
or to prune and guide the search through the state space for a solution,
or both [5, 6, 19, 25]. All such planners succeed by relying on the
static effects of actions—on the fact that you can tell by inspection
what the effects of an action will be in any situation—which limits
their scope in a particular way. In their Graphplan paper, Blum and
Furst make this observation [5, p. 299]:
One main limitation of Graphplan is that it applies only to
STRIPS-like domains. In particular, actions cannot create new
objects and the effect of performing an action must be something that can be determined statically. There are many kinds of
planning situations that violate these conditions. For instance,
if one of the actions allows the planner to dig a hole of an arbitrary integral depth, then there are potentially infinitely many
objects that can be created. . . The effect of this action cannot be
determined statically. . .
This limitation is also true of HSP [6], FF [19], REPOP [25], and
all the planners in the planning graph and POP traditions. The class
of problems that these planners do not attempt to solve—the ability
to dig a hole of an arbitrary depth (or perhaps to build a tower of
blocks of an arbitrary height)—was one that particularly interested
us.
1
2
Department of Computer Science, University of Liverpool, Liverpool L69
3BX, UK: [email protected]
School of Informatics, University of Manchester, PO Box 88, Manchester
M60 1QD, UK: [email protected]
Our motivation for this research was the problem of planning communicative actions. The well-documented, considerable difficulties
involved in making a plan to communicate a message to another person include this: a key player in the ensuing evolution of the environment is that person (the ‘hearer’ of the utterance).3
Consider Rob the robot, who can stack blocks, but cannot communicate messages. Rob’s goal in each task is to attain a state in which
there is a certain configuration of blocks. Rob makes a plan to build
a configuration of blocks, on a strong assumption that at plan execution time, he will be the author of the positions of the blocks. When
Rob puts a block in a certain position, he knows that the block will
have no desires and opinions of its own concerning which position
it should take up—he, Rob, expects to be in reasonable control of
the situation, (notwithstanding impeding concurrent events, possible
sensor or motor failures, and other hindrances).
If, however, human agent John’s goal is to get human agent Sally
to believe the proposition John is kind, in some respects, John has
a harder problem than Rob. John knows that Sally has desires and
opinions of her own, and, unlike Rob, John has no direct access to the
environment he wishes to affect. John cannot simply implant John is
kind into Sally’s belief state—it is more complicated than this. John
has to plan to do something that he considers will lead Sally to infer
John is kind. This means that when John is planning his action—
whether to give her some chocolate, pay her a compliment, lend her
his credit card—he has to consider the many different messages Sally
might infer from the one thing John chooses to say or do. There is
no STRIPS operator [13] John can choose that will have his desired
effect; he has to plan an action that he expects will entail the state he
desires.
With the ultimate aim of planning communicative actions (both
linguistic and non-linguistic), an agent was required that could make
plans for the purpose of communication, while knowing that any
single plan could have a myriad different effects on a hearer’s belief state, depending on what the hearer chose to infer. ‘Reasoningcentred’ planning of actions that entailed goals was an approach we
considered would enable this difficult predicament to be managed.
By means of experimenting with an implementation, a planner was
designed that reasons about the effects of actions, and is able to plan
to achieve goals that do not match action effects, but that are entailed
by the final state.
3
The term ‘hearer’ is used loosely to mean the observer or recipient of some
act of communication.
2
PLANNER DESIGN ISSUES
The planner’s general design was strongly influenced by its intended
application: to investigate the planning of communicative acts without using ‘speech acts’ [4, 28].4 We will now discuss some of the
early work we carried out on communicative action planning, in order to help explain the decisions that were taken concerning the planner’s design.
2.1
An early experiment in communicative action
Some tests were carried out to investigate the idea of informing.
‘Inform’ was simplistically interpreted as ‘to make know’, the idea
being that the effect of informing some agent a of a proposition p
is knows(a,p). A STRIPS operator was designed, having the name
inform(SPEAKER, HEARER, PROP).5 The operator had two preconditions
pre([
knows(SPEAKER, PROP),
knows(SPEAKER, not(knows(HEARER, PROP)))
])
task52j(P):plan(
[% GOAL CONDITION
knows(john,above(b,a)), knows(john,above(c,a)),
knows(john,above(d,a)), knows(john,above(e,a))],
[% INITIAL STATE
% (1) JOHN’S KNOWLEDGE STATE
[knows(john,isblock(a)), knows(john,on(a,b)),
knows(john,clear(a)), knows(john,isblock(b)),
knows(john,on(b,c)), knows(john,isblock(c)),
knows(john,on(c,d)), knows(john,isblock(d)),
knows(john,on(d,e)), knows(john,isblock(e)),
knows(john,on(e,table))],
% (2) GEORGE’S KNOWLEDGE STATE
[knows(george,colour(a,blue)),
knows(george,not(knows(john,colour(a,blue)))),
knows(george,colour(b,green)),
knows(george,not(knows(john,colour(b,green)))),
knows(george,colour(c,blue)),
knows(george,not(knows(john,colour(c,blue)))),
knows(george,colour(d,green)),
knows(george,not(knows(john,colour(e,green)))),
knows(george,colour(e,blue)),
knows(george,not(knows(john,colour(e,blue))))],
% RULE: above_1
[knows(_,on(X,Z) ==> above(X,Z)),
% RULE: above_2
knows(_,(on(X,Y) and above(Y,Z)) ==> above(X,Z))
]],P).
it had one positive effect:
and here is the first solution the early planner returned for it:
add([
knows(mutual([SPEAKER, HEARER]), PROP)
])
and it had one negative effect:
delete([knows(SPEAKER, not(knows(HEARER, PROP)))
])
The positive effect expressed the idea that after a speaker has uttered
a proposition (PROP) to his hearer, then the speaker and hearer mutually know PROP (i.e., each knows PROP, each knows the other knows
PROP, each knows the other knows they both know PROP, . . . ).6 The
operator’s negative effect matched one of the preconditions, and represented the idea that after informing HEARER of PROP, it is no longer
the case that SPEAKER knows that HEARER does not know PROP.
With the three actions inform(SPEAKER,HEARER,PROP),
stack(DOER,BLOCKX,BLOCKZ), unstack(DOER,BLOCKX,BLOCKZ),
the planner could make plans to communicate items of information
between agents, as well as to move blocks about. Tasks were
designed to test inform, including some in which a colour-blind
agent had to make blocks towers without putting two blocks of
the same colour next to each other.7 To achieve a task, a second
‘colour-seeing’ agent would have to inform the colour-blind agent of
the blocks’ colours. The different agents therefore had to have their
own distinct knowledge states, and so the theorem prover had to be
developed into an epistemic theorem prover before the tasks would
work. Here is one of the ‘colour-blind’ blocks-world tasks:
4
5
6
7
Although early work on the ‘speech acts with STRIPS’ approach was
promising, [7, 10, 2, 3], subsequent work has found many problems with
this approach [17, 26, 8].
The example is lifted from the model, which is implemented in Prolog.
Uppercase arguments are variables.
The model embodies a deduction model of belief [20], rather than a ‘possible worlds’ model [18, 21] (to be discussed shortly). Thus agents are not
required to draw all logically possible inferences, and are therefore not required to infer an infinite number of propositions from a mutual belief.
In order to make this possible, colour constraints were added to the stack
operator.
P = [unstack(john,a,b),unstack(john,b,c),
inform(george,john,colour(b,green)),
inform(george,john,colour(a,blue)),
stack(john,b,a),unstack(john,c,d),
inform(george,john,colour(c,blue)),
stack(john,c,b),unstack(john,d,e),
inform(george,john,colour(d,green)),
stack(john,d,c),
inform(george,john,colour(e,blue)),
stack(john,e,d)] ?
Being the first steps towards planning linguistic actions, the inform
operator, its accompanying tasks, and the planner’s new communicative action procedures were very crude. First, testing of the inform
operator at this early stage was minimal. There were many aspects
of informative utterances that inform did not attempt to account for.
These included: that people cannot know what other people do or do
not know; that you cannot assume that if you tell someone p they
will believe you; and that you may want to tell someone p in order to
inform them of a proposition q that is different from p [15].
Secondly, with regard to procedure, the planner made plans by
directly accessing both George’s and John’s knowledge states. The
planner was behaving like a third agent asking itself, ‘What would
be the ideal solution if George and John had this problem?’. A more
realistic approach would have been if the planner had been planning from John’s point of view alone, using John’s knowledge about
the current block positions, and his beliefs about George and what
George might know, to ask George for information, in anticipation
of needed answers.
This early work highlights some of the peculiarities of planning
communicative action as opposed to physical action. To communicate, agents have to be able to reason with their own beliefs about
the belief states of other agents, and to use that reasoning in order to
decide how to act—so this planner needs to be able to do epistemic
reasoning about its current situation, and its goals. Communicating
agents also have to be able to deal with the difficulty that uttering
a proposition p may cause the hearer to believe p, or it may not, or
it may cause the hearer to believe other things as well as or instead
of p—in other words, that the effects of communicative actions are
unknowable during planning (not to mention unobservable after application), and that they depend largely on the intention of the hearer.
2.2
Development strategy
In order to develop a design which possessed the desired capabilities,
the following strategy was taken:
• A state-space search was implemented that searches backwards in
hypothetical time from the goal via STRIPS operators (based on
foundational work in classical planning [23, 24, 14, 22, 13]);
• A theorem prover for first order logic was implemented that can
constructively prove conjunctions, disjunctions, implications, and
negations, and that includes procedures for modus ponens and unit
resolution;
• State-space search and theorem proving are interwoven in such a
way that:
– not only can disjunctions, implications and negations be
proved true, they can also be achieved;8
– not only can a goal q be proved true by proving (p ⇒ q) ∧ p,
but q can also be achieved by proving p ⇒ q and achieving p;
– a goal can be achieved by using the theorem prover’s inference
rules to reason with recursive domain-specific rules, which
transform the ‘slot-machine’ effects of actions into powerful
non-deterministic unquantifiable effects.
• The theorem prover has been transformed into an epistemic theorem prover by incorporating a theory of knowledge and belief suitable for human reasoning about action, so that agents can make
plans according to their beliefs about the world, including their
beliefs about others’ beliefs.
Since it is not possible to judge whether goals like above(e,f) are true
by inspecting the current state (which contains no above( , ) facts),
reasoning is carried out to find out which goals are false. Similarly,
since goals like above(e,f) do not match any action effects, reasoning
is also done to find out which actions would make a goal true.
To accomplish this, a theorem prover uses the effects of actions as
hypotheses. A goal is proved by assuming the effect of some action is
true, on the grounds that it would be true in the situation that resulted
from performing that action. Hence, a set of actions is computed that
might be useful for achieving a goal by carrying out hypothetical
proofs, where the hypotheses are the actions whose effects have been
exploited.
2.4
One approach to reasoning-centred planning might have been to
grow a graph forwards from the initial state, using heuristics to constrain its growth, and then to search through it, employing a theorem
prover and some inference rules to determine whether the goal condition was entailed by the state that had been attained. This strategy
would do reasoning retrospectively—‘Now that I’ve arrived at this
state by applying some action/s, can I prove that my goal is entailed
by this state?’ A prerequisite of this research was, however, that the
agent should be able to reason from the goal.
This preference for a backwards search was motivated by a defining quality of the communication problem, as epitomised by utterance planning: there are too many applicable actions to make a forwards search feasible. People generally have the physical and mental
capabilities to say whatever they want at any moment. This means
that the answer to the question ‘What can I say in the current state?’
is something like ‘Anything, I just have to decide what I want to
say’. A backwards search is far more suitable than a forwards search
under conditions like these.
3
2.3
Achieving entailed goals
The planner’s design is essentially an interweaving of epistemic reasoning and planning in such a way that the planner is not limited
to analysing actions purely as ‘slot-machine’ type operators (if you
apply this action, you get this static effect, and that’s the end of the
matter). In the blocks-world context, for example, the planner is not
limited to understanding the effect of stacking a block X onto another
block Y as on(X,Y); it is able to reason that X will be above any block
which happens to be beneath Y.
In order to achieve a goal above(X,Y), where above is the transitive
closure of on, something different from an action with an above( , )
expression in its add list is needed. Stacking e onto d, for example,
might have the effect above(e,f), and it might not—it depends on the
positions of d and f. The planner achieves an entailed goal (a goal
which does not match any action effects) by achieving parts of what
would otherwise be the constructive proof. It can thus achieve the
goal above(e,f) (for example) by using rules which define the meaning of ‘above’ in terms that can be related to the available actions.
8
Proving a goal by assuming the effect of some action is true is referred to
here as ‘achieving a goal’. The terms ‘proof’, ‘proving’ etc. are reserved
for the proving of goals within states.
Search direction
DEDUCTION MODEL OF BELIEF
The planner’s theorem prover embodies a deduction theory of belief
[20], chosen in preference to the ‘possible worlds’ model [18, 21].
The meaning of implication under possible worlds semantics does
not represent very well the way ‘implies’ is understood and used by
humans. Under possible worlds semantics, an agent draws all the
possible inferences from his knowledge (the well-documented problem of ‘logical omniscience’). This means that knowing q automatically follows from knowing p ∧ (p ⇒ q), there is no progressive
‘drawing’ of the conclusion q, no awareness by the agent that the
reason why he knows q is because he knows p ∧ (p ⇒ q). In contrast, humans reason with their knowledge; they make inferences,
which they would not have to do if they were logically omniscient.
To make their inferences, humans think about the content of an argument and the content relationships between the conclusions that
they draw, and the premises from which the conclusions are drawn.
Furthermore, humans are not ideal reasoners—their belief states are
incomplete, inaccurate, and inconsistent, and their inference strategies are imperfect.
A model of belief that would reflect human reasoning better than
the possible worlds model would be one in which an agent: (i) had a
belief set in which some beliefs were contradictory; (ii) could choose
certain beliefs from his belief set, while ignoring others; (iii) did not
have a complete set of inference rules; and (iv) was not forced to
infer every proposition that followed from his belief set. A model
which satisfies points (iii) and (iv), and therefore allows for points
(i) and (ii), is Konolige’s deduction model [20]. Konolige’s model
avoids the problem of logical omniscience, enabling the process of
inference to be modelled (how p is concluded, from which premises,
and using which rules).
4
CONSTRUCTIVE LOGIC
In addition to a deduction theory of belief, the planner’s theorem
prover also embodies a constructive/intuitionist logic, in preference
to classical logic. Classical logic can be thought of as investigation
into truth, to the exclusion of all else: the emphasis is truth, the goal is
truth, and the proofs are viewed as a means to the end of truth. Soundness and completeness are of paramount importance, while how the
conclusion p is reached is not. In contrast, constructive logic can be
thought of as an investigation into proofs, rather than into truth. The
emphasis in constructive logic is on what can be done, what are the
rules, what moves can be made, rather than on whether p is true or
not.
Classical logic can tell you that it’s true that there either is or there
isn’t a very valuable stamp in your attic, simply by invoking the law
of the excluded middle. p ∨ ¬p is always true. No-one has to look in
the attic in order to carry out the proof. In contrast, constructive logic
lacks the law of the excluded middle. This does not mean that in constructive logic p ∨ ¬p is false, but it does mean that in constructive
logic, if there is no construction which proves p ∨ ¬p, then p ∨ ¬p
is not provable. So to constructively prove that it’s true that there is
or there is not a very valuable stamp in your attic, someone will have
to look in the attic (i.e., construct a proof of p or construct a proof of
¬p).9
4.1
Why constructive logic?
The reason behind the preference for constructive logic is the difference between the meaning of the implication operator in constructive
logic, and its meaning in classical logic. Planning actions on the basis of reasoning about what one believes necessitates being able to
make inferences concerning what is true and what is false. But under
classical logic, an agent cannot infer ¬ p unless the formula ¬ p is
already in his belief set. So, for example, John could not infer that
Sally was not single, if he knew that Sally was married, and he knew
that a person cannot be both married and single at the same time.
The reason why under material implication it is unacceptable to infer a belief that ¬ p from a set of beliefs that does not include ¬ p is
as follows. Classically, p ⇒ q can be proved from ¬ p, which means
that Bjohn p ⇒ Bjohn q 10 can be proved from ¬Bjohn p. If we then
assert that Bjohn (p ⇒ q) is entailed by Bjohn p ⇒ Bjohn q, then
(by transitivity of implication) we are saying that Bjohn (p ⇒ q) is
entailed by ¬Bjohn p. In other words, that John is required to infer
9
. . . and they will have to be a philately expert.
This formula is in an epistemic logic notation (standard since [18]). In the
formulae Ba p, a is the name of an agent, Ba is the epistemic operator for
belief (which incorporates the agent’s name), and p is a proposition, and
the scope of the epistemic operator. In English, Ba p means a believes p.
10
from not having the belief p that he has the belief p ⇒ q. This is an
unacceptably huge inferencing leap to make. Since it is unacceptable
to infer knowledge of the relation of implication from no knowledge
of such a relation, it is therefore also unacceptable to infer knowledge
of a negation from no knowledge of that negation (because the negation ¬ p has the same meaning in classical logic as the implication
p ⇒ ⊥).
Under constructive implication, however, John is not required to
infer the belief p ⇒ q from not having the belief that p, because under constructive implication, Bjohn p ⇒ Bjohn q cannot be proved
from ¬Bjohn p, it can only be proved if you can construct a proof
of Bjohn q while pretending that Bjohn p is true. This means that it
is acceptable to infer Bjohn (p ⇒ q) from Bjohn p ⇒ Bjohn q,
because no unacceptably huge inferencing leaps are required by this.
A small inferencing leap is made, however, and here follows our justification for it.
4.2
Argument from similarity
We make the assumption that the agent’s reasoning powers are the
same as ours, he is neither a more able reasoner, nor a less able reasoner than we humans are. We humans understand p ⇒ q to mean
that there is some connection between objects p and q. That ⇒ signifies a connection is allowed for and even emphasised in constructive
logic, but it is not acknowledged in classical logic. Since I understand ⇒ to mean some kind of connection between objects, then I
can infer that John will understand ⇒ in the same way—to be the
same kind of connection between objects—because John is just like
me. This means that I can argue as follows. I know that the following
is true:
John believes it0 s raining ⇒ John believes it0 s cloudy
(1)
in other words, I can see that there is a connection (shown by ⇒)
between John believing that it is raining, and John believing that it is
cloudy. I make the assumption that John’s reasoning capabilities are
just like mine. l can therefore infer that if (1) is true, John will himself
be able to see that there is a connection (⇒) between his believing
that it is raining, and his believing that it is cloudy—in other words,
I can infer that (1) entails (2):
John believes (it0 s raining ⇒ it0 s cloudy)
5
(2)
REASONING-CENTRED PLANNING
Now that the motivation for the planner design has been outlined,
the remainder of the paper will concentrate on the theme of achieving goals that do not match action effects, but that are rather entailed
by them. This section will present a summarised example to give an
overview of what is meant by using the effects of actions as hypotheses. As explained, the planner is an epistemic planner. All predicates
in the model therefore have an ‘epistemic context’ (after [30]). For
example, believes(john,(on(a,b))) appears in the model, instead of
simply on(a,b). However for the purposes of this paper, whose focus is planning rather than communication, epistemic context will be
omitted from the remaining examples.
Figure 1 shows a pictorial representation of a blocks-world problem. The long thick horizontal line represents the table; where two
objects are touching, this indicates that the upper object is on the
lower object; and where an arc is drawn between two blocks, this indicates that the upper block is above the lower block, where above is
the transitive closure of on. Figure 1 is followed by the Prolog code
for the task:11
Initial State
Goal Condition
the agent reasons that above(c,b) would be entailed by a state in
which on(c,b) was true:
[on(c, b)] ⇒ above(c, b)
(3)
and so, by state-space search, it considers planning to achieve on(c,b)
by unstacking a from b, then stacking c onto b. This path fails, however, since on(a,b) is also a goal, and on(c,b) and on(a,b) cannot cooccur. So the agent invokes its second above rule (above 2):
c
a
a
b
b
c
Figure 1. Task A
taskA(P):plan(
[% GOAL CONDITION
above(c,b), on(a,b)],
[% INITIAL STATE
isblock(a), on(a,b),
clear(a),
isblock(b), on(b,table),
isblock(c), on(c,table), clear(c),
% RULE: above_1
[on(X,Z)] ==> above(X,Z),
% RULE: above_2
[on(X,Y),above(Y,Z)] ==> above(X,Z)], P).
Let us assume that we have two operators: stack and unstack. Here
is a shortened version of stack:
action(stack(X, Z),
pre([...],add([on(X, Z)]),delete([...]))).
The critical issue here is that although the goal condition of Task A
contains a predicate of the form above(X,Y), the add list of stack does
not contain any above predicates, it contains only a single on predicate.
Our planner is able to find stack actions to achieve above goals,
because it understands the relationship between ‘above’ and ‘on’ by
virtue of the two above inference rules in the initial state:12 one block
X is above another block Z in situations where X is on the top of a
tower of blocks which includes Z, with any number of blocks (including zero) intervening between X and Z.
To achieve Task A, the planning agent begins by establishing
whether or not the goal condition is entailed by the initial state. It
finds by inspection of the initial state that one of the goal condition
conjuncts is true (on(a,b)), and by using a theorem prover to reason
with the above rules it finds that the second goal condition conjunct is
false (above(c,b)). The agent then reasons to discover which actions
would achieve above(c,b) from the initial state.
To accomplish this, theorem proving is carried out using the effects
of actions as hypotheses. Using the first of its above rules (above 1),
11
12
A conjunction of items (p ∧ q) appears in the encoded task as the list [p, q].
The rules shown are a simplified version of the model’s above rules. In the
model they include constraints ensuring that any attempt that is made to
prove that an item is above itself, or to plan to put an item onto itself, will
fail immediately.
[on(c, Y), above(Y, b)] ⇒ above(c, b)
(4)
Now by using both above rules, the agent reasons that above(c,b)
would be entailed by a state in which c was on some block Y, and Y
was on b, and so it makes a plan (by state-space search) to achieve the
conjunction (on(c,Y) ∧ on(Y,b)). It starts by addressing the first conjunct, on(c,Y), and finds that action stack(c,a) would achieve it, and
is applicable in the initial state. The antecedent of (4) now becomes
fully instantiated:
[on(c, a), above(a, b)] ⇒ above(c, b)
(5)
Now the second conjunct above(a,b) of (5) is addressed. Using rule
above 1 only, the agent reasons that above(a,b) is entailed by the
initial state, since it can see by inspection that on(a,b) is true in the
initial state, and so no further actions are planned for the achievement of the antecedent of (5). Plan [stack(c,a)] is then applied to the
initial state, a state is attained which entails the goal condition, and
the planner returns solution [stack(c,a)].
6
ACHIEVING TRUE GOALS
A more detailed explanation will now be given of one of the features
that enable the planner to make plans by reasoning with the above
rules, its ‘LAN’ heuristic.
During testing, it was found that there were many tasks the planner could not achieve unless it had the facility to achieve a goal that
it could prove true in the current state. This sounds odd—why would
one want to achieve a goal that was already true? The problem arises
during the proof/achievement of a goal having a variable as an argument. The situations in which this occurs include when certain
domain-specific rules are invoked (including recursive rules such as
above 2, and absurdity rules), and when the preconditions of a desirable action are being addressed, as will now be discussed.
6.1
Recursive rules
Consider the procedure the planner follows to achieve the task shown
in Figure 2:
Initial State
Goal Condition
e
c/d
c
d
c/d
e
(above(e,c) ∧ above(e,d))
Figure 2. Task B
The planner initially tries to prove that both task goals are true in the
initial state. To do this, it first invokes inference rule above 1 to prove
the first task goal above(e,c) true in the initial state:
Step 1 : P rove above(e, c) using rule above 1 :
[on(e, c)] ⇒ above(e, c)
Now it tries to prove that on(e,c) (the single antecedent of this rule)
is true, but it fails, since block e is not on block c in the initial state.
Now the planner backtracks and invokes inference rule above 2 to
try to prove the same goal above(e,c) true in the initial state:
Step 2 : P rove above(e, c) using rule above 2 :
[on(e, Y), above(Y, f)] ⇒ above(e, c)
Addressing the conjunctive antecedent of this rule, the planner now
proves on(e,Y) from on(e,table) in the initial state, but then fails to
prove above(table,f), and so the proof of above(e,c) from the initial
state has failed.
Now that this proof has failed, the planner abandons the first task
goal, and tries to prove the second one instead: above(e,d). This proof
also fails, and so the planner sets about making a plan to achieve the
first task goal above(e,c). To do this, as with the attempted proofs,
the planner first invokes inference rule above 1:
Step 3 : Achieve above(e, c) using rule above 1 :
[on(e, c)] ⇒ above(e, c)
As before, the planner now tries to prove the rule’s antecedent on(e,c)
true, but this time when the proof fails, it makes a plan to achieve
on(e,c) (to stack e onto c). Having done this, the planner finds that
it cannot achieve the second task goal above(e,d) without invoking a
goal protection violation [29], and so it backtracks to try to achieve
the first task goal above(e,c) by invoking the alternative inference
rule above 2:
Step 4 : Achieve above(e, c) using rule above 2 :
[on(e, Y), above(Y, f)] ⇒ above(e, c)
Now the first conjunct on(e,Y) of the rule’s antecedent is addressed.
Since every block in this blocks world is always on something, either
on the table, or on another block, then in any state in which there is
a block called e, on(e,Y) will always be true—and this is where we
see that it is necessary to be able to achieve a goal that can be proved
true in the current state. A proof of on(e,Y) holds, but block e may
nevertheless have to be moved off Y in order to form a solution.
To enable the planner to achieve true goals without disabling it,
a heuristic was built in which constrains this activity to restricted
circumstances. The heuristic has been named ‘LAN’, which stands
for ‘look away now’. ‘Look away now’ is typically said by British
TV sports news readers who are about to show a new football match
result. It is said as a warning to viewers who don’t want to know the
score yet, perhaps because they have videoed the match. The name
‘LAN’ reflects the characteristic feature of the heuristic, which is
to look away from the Prolog database (the facts), and pretend that
something is not true, for the sake of future gain.
The LAN heuristic is restricted to working on goals that are not
fully instantiated (i.e., goals which include at least one variable),
which means it is only invoked in the following three contexts: when
certain preconditions are being achieved, when the antecedents of
rules are being achieved, and when there are variables in the task
goal. The planner is prohibited from achieving goals which are fully
instantiated and can be proved true in the current state, because there
does not appear to be any need for this, and allowing this dramatically increases the search space, hence slowing the planner down
enormously.
6.1.1 Skipping proofs
At Step 4 in the procedure described above for Task B, although the
first conjunct on(e,Y) of the antecedent of above 2 can be proved in
the initial state of Task B from on(e,table), at this point no attempt
is made to prove on(e,Y), the planner goes straight to searching for
a plan to make on(e,Y) true. This skipping of a proof is only permitted: (i) when the first conjunct of the antecedent of a recursive rule
is being addressed; and (ii) when the goal condition is false in the
current state.
Being unable to do a proof at this point speeds the planner up considerably, as it cuts out much repetitive searching and backtracking.
It does not break the planner, because the proof has already been
tried earlier and failed. If on(e,Y) were already true in a way conversant with the solution, the planner would not have reached this point
(which it can only do by backtracking). The planner has already tried
proving on(e,Y) (via above 1). The proof may have failed, or it may
have succeeded, in which case the path must have reached a dead end
and the planner must have backtracked to try achieving on(e,Y) instead. Either way, the planner must have also tried achieving on(e,Y)
by stacking e onto Y (also via above 1). This must, however, also
have failed to lead to a solution, because the planner has now backtracked to try achieving on(e,Y) by using above 2.
6.2
Preconditions
When achieving the partially instantiated first conjunct of an antecedent, the planning agent does not even attempt a proof, because
it knows that a proof could not lead to a solution. However, when
a partially instantiated precondition of an action is being addressed,
the planning agent does attempt a proof of the precondition, but if a
successful proof fails to lead to a solution, the agent is permitted to
try to achieve the precondition, in spite of it being true in the current
state. Figure 3 shows a task which illustrates:
Initial State
Goal Condition
c
d
p
f
q
e
r
e
q
f
(on(e,q) ∧ above(e,f))
Figure 3. Task C
To achieve the goal condition conjunct above(e,f), the antecedent
of above 2 gets instantiated as follows:
[on(e, Y), above(Y, f)] ⇒ above(e, f)
The preconditions of stack(e,Y) (for achieving on(e,Y)) include the
following:
[on(e, table), clear(Y), clear(e)]
Block e is already on the table in the initial state, so the first condition is met. To get Y (some block) clear, nothing needs doing, because there is a clear block in the initial state that is not block e.13
To get e clear, the tower positioned on top of e in the initial state is
unstacked, leaving five blocks clear (Figure 4). Precondition clear(Y)
remains true, as there is still a clear block that is not e.
p
6.2.1 Inverse action problems
There is a problem with allowing the planner to achieve preconditions that are already true. The agent is prone to making ‘undesirable’ plans like [stack(X,Z),unstack(X,Z)] which do not advance the
solution. For example, in order to get block a onto the table when a
is not clear, what we want to do is first unstack whatever happens
to be on top of a, in order to make a clear. If this path fails to lead
to a solution, we do not want the planner to backtrack and achieve
the precondition on(X,a) by planning to put something onto a, just
so that the same block can then be unstacked. This, however, is what
happens if we allow the planner to backtrack and achieve the precondition on(X,Z) of action unstack(X,Z) when it is true in the current
state.
One of the reasons for the generation of these ‘undesirable’ plans
is that operators stack(X,Z) and unstack(X,Z) are related to each other
in the sense that each operator is a kind of ‘inverse’ of the other. By
this is meant that the ‘net’ effects of each operator satisfy precisely
the preconditions of the other, nothing more happens than this, and
nothing less. ‘Net’ effects means here the union of the contents of an
operator’s add list, and those things which are true before the application of the operator and remain true after its application.
If this was not stopped, the fruitless time the planner would spend
making plans including undesirable ‘inverse’ pairs of actions would
be significant. However, if the planner is totally prohibited from trying to achieve true goals, many tasks fall outside its scope (Task C
is one of these). In order to cope with such cases, the planner carries out a search for action pairs where: (i) the net effect of applying
them in sequence is zero; and (ii) one of them has a precondition
which is an effect of the other. For one member of each such pair we
mark as ‘special’ the preconditions which match effects of the other
member of the pair (for instance, the precondition on(X,Z) of action
unstack(X,Z) might be marked). If a precondition has been proved
true in the current state, but the path has failed to lead to a solution,
the planner is permitted to backtrack to try achieving it, unless the
precondition has been marked; in this case, the agent may only attempt re-proofs, it may not try to achieve the precondition. (These
restrictions only apply if the marked precondition cannot be proved
in the current state: if it is not already provable then the planner is at
liberty to try to achieve it.)
q
f
d
c
e
r
Figure 4. Task C mid-search
The difficulty is that to achieve this task, we do not at this point
need any block Y to be clear, we need q to be made clear, so that
the planner can make plans to stack e onto q, and q onto f —in other
words, although clear(Y) is true, it is true ‘in the wrong way’, and
the planner needs to make it true in a different way by planning some
actions to achieve clear(Y), even though it is true. The LAN heuristic
enables the planner to deal with this difficulty.
13
Operator constraints ensure that block Y cannot be the same block as block
e.
7
EXPLOITING RECURSIVE RULES
This section will discuss in more detail how the antecedent of
above 2 is used to achieve an above goal. Consider Task D (depicted
in Figure 5).
Let us assume the planning agent has reached a point at which rule
above 2 is instantiated for the achievement above(e,c):
[on(e, Y), above(Y, c)] ⇒ above(e, c)
(6)
The first important point here is that when the agent does a proof
of above(e,c), it is proving both conjuncts of the antecedent true at
the same time/in the same state. However, when the agent is trying to achieve above(e,c), the plan to achieve on(e,Y), and the plan
to achieve above(Y,c) apply to different times/states: the plan to
Initial State
Goal Condition
heuristic reduces encounters with goal interactions considerably. Figure 6 shows a task that demonstrates ThA at work:
e
Initial State
Goal Condition
c/d/f
c/d/f
c
d
e
c/d/f
f
(above(e,c) ∧ above(e,d) ∧ above(e,f))
c
a
a
b
b
c
(on(a,b) ∧ above(c,b))
Figure 5. Task D
Figure 6. Task E
achieve on(e,Y) must (with these operators) chronologically follow
the plan to achieve above(Y,c).
Secondly, the agent can infer from (6) that in order to achieve the
goal above(e,c), action stack(e,Y) will have to be included at some
point in the solution, and action stack(Z,c) will also have to be included in the solution. The agent does not yet know the identities
that Y and Z will take in the solution. Y may be identical to Z, or it
may not:
Here is above 2 instantiated for the achievement of goal
above(c,b):
[on(c, Y), above(Y, b)] ⇒ above(c, b)
(7)
A plan is made that would achieve the first conjunct on(c,Y) of the
antecedent of (7) from the initial state W0:
[. . . stack(Z, c), . . . stack(e, Y), . . .]
Thirdly, we observe that the antecedent of above 2 is conjunctive,
and we know that using a linear approach to achieve a conjunction
of goals presents the problem of goal interaction [29]. It may not be
possible to achieve on(e,Y) from the state which is attained by the
achievement of above(Y,c) without generating goal protection violations.
Bearing these points in mind, a linear approach to achieving the
antecedent of above 2 is unattractive. A linear approach will suffer
from negative goal interactions, and even if it did not, there is no
guarantee that the plan made to achieve [on(e,Y),above(Y,c)] from the
current state could form part of the solution, because the consequent
of (6), above(e,c), is itself one of a conjunction of goals in the goal
condition. This planner therefore takes a slightly non-linear approach
to the achievement of the antecedents of recursive rules: actions are
planned within a linear framework, but a bit of plan cross-splicing is
done, by means of a heuristic that has been named ‘Think Ahead’, or
‘ThA’ for short.
7.1
Thinking ahead
ThA works by incorporating thinking about the preconditions of
chronologically later actions into the planning of earlier actions,
which it does by exploiting the chronological information in the antecedent of a recursive domain-specific inference rule. It takes the
plans made to achieve the separate conjuncts in the antecedent of a
recursive rule, and uses parts of them to construct a new ‘suggested’
plan. The approach is a mixture of total order (linear) and plan-space
planning; a commitment to chronological ordering is made at the
time the actions are found, but the ordering of actions within the plan
is systematically changed later on. By taking into account the preconditions of a later goal during planning for an earlier goal, the ThA
[unstack(a, b), unstack(b, c) | stack(c, a)]
(8)
(8) is divided into two parts: the single action stack(c,a)
that will achieve on(c,Y); and the ‘preconditions plan’
[unstack(a,b),unstack(b,c)], the plan that would transform W0
into a state in which stack(c,a) was applicable. Now a plan is made
to achieve the second (now fully instantiated) antecedent conjunct of
rule (7), above(a,b), also from state W0:
[unstack(a, b) | stack(a, b)]
(9)
(9) is similarly divided into a final action, in this case stack(a,b),
and a preconditions plan: [unstack(a,b)]. The preconditions plan for
achieving on(c,Y) then gets spliced onto the final action planned for
the achievement of above(Y,b):
[unstack(a, b), unstack(b, c), stack(a, b)]
(10)
When (10) is applied to W0, Figure 7 is the result:
a
c
Figure 7.
b
Task E mid-search: W1
By this point, task goal above(c,b) is still not true. To complete the
solution, the planner again addresses above(c,b), but this time from
the new state. Rule above 2 is again used:
[on(c, Y), above(Y, b)] ⇒ above(c, b)
Initial state
(11)
but this time no cross-splicing is done, and a different plan is made:
[stack(c,a)]. This is because the first antecedent member on(c,Y) of
(11) has plan [stack(c,a)] made to achieve it, whereupon the second
antecedent member becomes instantiated to above(a,b). This is found
to be true in the current state (W1), so no plan is made to achieve it.
The path succeeds, and the following solution is returned:
Goal condition
a
c
b
a
b
Figure 8.
c
Task Sussman1
[unstack(a, b), unstack(b, c), stack(a, b), stack(c, a)]
7.3
In the case of Task E, the cross-splicing alone led to a solution.
There is, however, no guarantee that the application of the preconditions plan for the first conjunct will always attain a state in which the
final action of the plan for the second conjunct can be applied. This
is dealt with in the model by an additional state-space search.
7.2
A guarded states check prevents repeated work
Planning actions to achieve above( , ) goals by using the two above
rules can lead to fruitless repeated reasoning, unless this is checked.
For example, consider again Task E (see Figure 6).
This task has two goals: [on(a,b),above(c,b)]. Here is a trace of
the first stages of reasoning for this task, in which the planner attains
the same state via two different paths:
1. Achieves goal above(c,b) by stacking c onto b (using above 1).
2. The current complete plan is [unstack(a,b),unstack(b,c),stack(c,b)].
2a. Tries to subsequently achieve on(a,b), but can’t without goal protection
violation.
2b. Fails and backtracks to point (1) to try achieving above(c,b) in a different way (by using above 2).
3. Plans to achieve first conjunct of above 2, on(c,Y) by stack(c,Y).
3a. Preconditions of stack(c,Y) include that c is on the table and that Y is
clear.
3b. To get c onto the table, unstacks a from b, and then b from c.
3c. To get Y clear, nothing has to be done, because two blocks are now
clear (a and b). Block b is the first clear block the planner finds, so Y
gets instantiated to b.
4. Achieves first conjunct on(c,Y) of above 2 by stacking c onto b.
5. The current complete plan is [unstack(a,b),unstack(b,c),stack(c,b)].
By this point, the planner has made the same plan
[unstack(a,b),unstack(b,c),stack(c,b)] twice in two different
ways. If the planner does not fail now and backtrack, it will repeat
the work it did at point (2a) (trying to subsequently achieve on(a,b)
and failing), before backtracking to try to achieve on(c,Y) in a
different way.
Fruitless repeated work like this is prevented by ‘guarding’ each
state that is attained in the search for a solution. If the identical state
is reached for a second time, the planner knows that there is no point
in continuing, and is instructed to fail and backtrack.
For simpler tasks (like Task E), the time wasted doing repeated
work like this is not significant, but for harder tasks, the time grows
very quickly. For Task C, for example, the planner takes 7 seconds to
return the first solution if the guarded states check is made, and 95
seconds if the check is not made.
Solving the Sussman anomaly
With regard to the Sussman anomaly [27], if the goal condition is
expressed by on(a,b) ∧ on(b,c), as in the original representation of
the problem (see also Figure 8):
task_sussman1(P):plan(
[ % GOAL CONDITION
on(a,b), on(b,c)],
[ % INITIAL STATE
[isblock(a), on(a,table)],
[isblock(b), on(b,table), clear(b)],
[isblock(c), on(c,a), clear(c)
]],
P).
the planner is unable to invoke the ThA heuristic (which reduces goal
interaction problems), because ThA is designed for the achievement
of the antecedents of domain-specific inference rules, and Task Sussman1 doesn’t contain any such rules. This means that the planner
fails for this task. However, the planner can achieve the Sussman
anomaly by the appropriate use of an above( , ) goal, and the inclusion of the above rules in the initial state. Figure 9 shows one option:
Initial state
Goal condition
a
b
c
a
c
b
Figure 9.
Task Sussman2
task_sussman2(P):plan(
[ % GOAL CONDITION
above(a,c),
on(b,c)],
[ % INITIAL STATE
[isblock(a), on(a,table)],
[isblock(b), on(b,table), clear(b)],
[isblock(c), on(c,a), clear(c)],
% RULE: above_1
[on(X,Z) ==> above(X,Z),
% RULE: above_2
(on(X,Y) and above(Y,Z)) ==> above(X,Z)
]],
P).
for which the following plan is returned:
P = [unstack(c, a), stack(b, c), stack(a, b)]?
Alternatively, if there are additional blocks on the table besides a, b
and c, and one requires that no blocks may ever be stacked between
a and b, or between b and c in any solutions, the following goal condition will ensure this:
task_sussman3(P):plan(
[ % GOAL CONDITION
above(a,c), on(a,b), on(b,c)],
...
P).
Yet another alternative is to specify the goal condition as the single
goal above(a,c). The first solution returned for this task is:
P = [unstack(c, a), stack(a, c)]?;
while the second solution generates the state specified by the Sussman anomaly:
P = [unstack(c, a), stack(b, c), stack(a, b)]?
To solve the Sussman anomaly, then, the planner solves a harder
problem, a consequence of which is that there are a number of different goal conditions which will ensure that the configuration of blocks
shown in Figure 10:
a
b
c
Figure 10.
can be achieved from the configuration of blocks shown in Figure 11:
c
a
b
Figure 11.
8
without speech acts.14 Deliberations on the nature of the effects of
communicative actions, on the intentions of a speaker, and on the
role of the hearer in the evolution of the environment, guided decisions concerning planner design towards a model that could achieve
entailed goals by reasoning about actions, and which embodied a theory of belief representative of human reasoning. The planner therefore has a reasoning-centred approach, rather than the more popular
search-centred approach. This enables it, for example, to make plans
for building blocks-world towers whose configurations are not specified in the goal condition—towers of arbitrary heights. The planner
achieves goals of the form above(X,Y) by knowing that ‘above’ is the
transitive closure of ‘on’, and using that knowledge to reason over
the effects of actions. A theorem prover uses the effects of actions as
hypotheses. A goal is proved by assuming the effect of some action is
true, on the grounds that it would be true in the situation that resulted
from performing that action. The model can thus achieve goals that
do not match action effects, but that are entailed by them.
Decisions made concerning planning approach, logic, theorem
proving, and modelling belief, were considered to be of paramount
importance to the success of the research into planning dialogue
without speech acts, and so care was taken to make suitable choices.
The planner’s epistemic theorem prover therefore embodies a constructive logic, and a deduction model of belief. The planner’s essential design is an interweaving of reasoning, backwards state-space
search via STRIPS operators, and plan-space planning. The search
is enhanced with two specially designed heuristics which enable it to
overcome the difficulties presented by planning for conjunctive goals
using a backwards-searching state-space planner. The LAN (‘look
away now’) heuristic enables the planner to, in certain highly restricted contexts, achieve a goal that is provable in the current state;
the ThA (‘think ahead’) heuristic takes into account the preconditions
of a chronologically later goal during planning for an earlier goal,
which it does by exploiting the information held in the antecedents
of recursive rules.
No claims are being made that the model is competitive in terms
of efficiency with state-of-the-art AI planners for large problems not
requiring inference; the claim that is being made is that it can solve
an essentially more difficult kind of problem than these fast planners
can solve, by understanding and using actions in the wider context of
their positive ramifications. Since the model represents the first steps
towards developing a very particular kind of planning, it suffers from
a number of obvious limitations, including the following:
• Unlike real knowledge states, the initial state is very small;
• The initial state specifies everything the planning agent needs to
know—the agent does not have to cope with an underspecified
initial state, for example uncertain beliefs, or a lack of necessary
information;
• The planner does not take into account possible conflicts between
resources of time, energy, materials, etc.;
• The planner does not take into account the durative nature of actions, or that the universe is dynamic.
SUMMARY AND CONCLUSION
This paper has presented an AI planner that was designed and implemented in preparation for some research into planning dialogue
14
The research eventually led to the development of a single linguistic act for
all contexts [12], and hence the abandonment of a distinct inform operator.
ACKNOWLEDGEMENTS
This research was funded by an EPSRC studentship.
REFERENCES
[1] J. Allen, J. Hendler, and A. Tate. Editors. Readings in planning, 1990.
San Mateo, California: Morgan Kaufmann.
[2] J. F. Allen and C. R. Perrault. Analyzing intention in utterances, 1980.
Artificial Intelligence 15: 143–78. Reprinted in [16], pp. 441–58.
[3] D. E. Appelt. Planning English referring expressions, 1985. Artificial
Intelligence 26: 1–33. Reprinted in [16], pp. 501–17.
[4] J. L. Austin. How to do things with words, 1962. Oxford: Oxford
University Press, 2nd edition.
[5] A. L. Blum and M. L. Furst. Fast planning through planning graph
analysis, 1995. In Artificial Intelligence 90: 281–300, 1997.
[6] B. Bonet and H. Geffner. Hsp: Heuristic Search Planner, 1998. Entry at
Artificial Intelligence Planning Systems (AIPS)-98 Planning Competition. AI Magazine 21(2), 2000.
[7] B. C. Bruce. Generation as a social action, 1975. In B. L. Nash-Webber
and R. C. Schank. Editors. Theoretical issues in natural language processing, pp. 74–7. Cambridge, Massachusetts: Ass. for Computational
Linguistics. Reprinted in [16], pp. 419–22.
[8] P. R. Cohen and H. J. Levesque. Rational interaction as the basis for
communication, 1990. In [9], pp. 221–55.
[9] P. R. Cohen, J. Morgan, and M. E. Pollack. Editors. Intentions in communication, 1990. Cambridge, Massachusetts: MIT.
[10] P. R. Cohen and C. R. Perrault. Elements of a plan-based theory of
speech acts, 1979. Cognitive Science 3: 177–212. Reprinted in [16],
pp. 423–40.
[11] E. A. Feigenbaum and J. Feldman. Editors. Computers and thought,
1995. Cambridge, Massachusetts: MIT Press. First published in 1963
by McGraw-Hill Book Company.
[12] D. Field and A. Ramsay. Sarcasm, deception, and stating the obvious: Planning dialogue without speech acts, 2004. Artificial Intelligence
Review 22(2): 149–171.
[13] R. E. Fikes and N. J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving, 1971. Artificial Intelligence 2: 189–208. Reprinted in [1], pp. 88–97.
[14] C. Green. Application of theorem proving to problem solving, 1969. In
Proc. 1st IJCAI, pp. 219–39. Reprinted in [1], pp. 67–87.
[15] H. P. Grice. Logic and conversation, 1975. In P. Cole and J. Morgan.
Editors. Syntax and semantics, Vol. 3: Speech acts, 1975, pp. 41–58.
New York: Academic Press.
[16] B. J. Grosz, K. Sparck Jones, and B. L. Webber. Editors. Readings
in natural language processing, 1986. Los Altos, California: Morgan
Kauffmann.
[17] B. J. Grosz and C. L. Sidner. Plans for discourse, 1990. In [9], pp. 416–
44.
[18] J. Hintikka. Knowledge and belief: An introduction to the two notions,
1962. New York: Cornell University Press.
[19] J. Hoffmann and B. Nebel. The FF planning system: Fast plan generation through heuristic search, 2001. Journal of Artificial Intelligence
Research 14: 253–302.
[20] K. Konolige. A deduction model of belief, 1986. London: Pitman.
[21] S. Kripke. Semantical considerations on modal logic, 1963. In Acta
Philosophica Fennica 16: 83–94. Also in L. Linsky. Editor. Reference
and modality (Oxford readings in philosophy), 1971, pp. 63–72. London: Oxford University Press.
[22] J. McCarthy and P. J. Hayes. Some philosophical problems from the
standpoint of artificial intelligence, 1969. Machine Intelligence 4: 463–
502. Reprinted in [1], pp. 393–435.
[23] A. Newell, J. C. Shaw, and H. A. Simon. Empirical explorations with
the logic theory machine, 1957. In Proceedings of the Western Joint
Computer Conference, 15: 218–239, 1957. Reprinted in [11], pp. 109–
133.
[24] A. Newell and H. A. Simon. GPS, a program that simulates human
thought, 1963. In [11], pp. 279–93.
[25] X. Nguyen and S. Kambhampati. Reviving partial order planning,
2001. In Proc. IJCAI, pp. 459–66.
[26] M. E. Pollack. Plans as complex mental attitudes, 1990. In [9], pp. 77–
103.
[27] E. D. Sacerdoti. The nonlinear nature of plans, 1975. In Proc. 4th
IJCAI, Tbilisi, Georgia, USSR, pp. 206–14. Reprinted in [1], pp. 162–
70.
[28] J. R. Searle. What is a speech act?, 1965. In M. Black. Editor. Philosophy in America, pp. 221–39. Allen and Unwin. Reprinted in J. R. Searle.
Editor. The philosophy of language, 1991, pp. 39–53. Oxford: Oxford
University Press.
[29] G. J. Sussman. The virtuous nature of bugs, 1974. In Proc. Artificial Intelligence and the Simulation of Behaviour (AISB) Summer Conference.
Reprinted in [1], pp. 111–17.
[30] L. Wallen. Matrix proofs for modal logics, 1987. Proc. 10th IJCAI,
pp. 917–23.