Download Report

AGENTS AND
ENVIRONMENTS
Environment types



Fully observable (vs. partially observable): An agent's sensors
give it access to the complete state of the environment at each
point in time.
Deterministic (vs. stochastic): The next state of the
environment is completely determined by the current state
and the action executed by the agent.
Episodic (vs. sequential): The agent's experience is divided into
atomic "episodes" (each episode consists of the agent
perceiving and then performing a single action), and the
choice of action in each episode depends only on the episode
itself.
Environment types




Static (vs. dynamic): The environment is unchanged
while an agent is deliberating.
Discrete (vs. continuous): A limited number of
distinct, clearly defined percepts and actions.
Single agent (vs. multiagent): An agent operating by
itself in an environment.
Adversarial (vs. benign): There is an opponent in the
environment who actively trying to thwart you.
Example

Some of these descriptions can be ambiguous, depending on
your assumptions and interpretation of the domain
Continuous
Stochastic
Partially
Observable
Adversarial
Chess,
Checkers
Robot Soccer
Poker
Hide and
Seek
Cards
Solitaire
Minesweeper
Environment types
Fully observable
Deterministic
Episodic
Static
Discrete
Single agent

Chess with
a clock
Yes
Yes
No
Semi
Yes
No
Chess without
a clock
Yes
Yes
No
Yes
Yes
No
Taxi driving
No
No
No
No
No
No?
The real world is partially observable, stochastic, sequential, dynamic,
continuous, multi-agent
GAMES
(I.E. ADVERSARIAL SEARCH)
Games vs. search problems

Search: only had to worry about your actions
Games: opponent’s moves are often interspersed
with yours, need to consider opponent’s action

Games typically have time limits

 Often, an ok decision now is better than a perfect
decision later
Games





Card games
Strategy games
FPS games
Training games
…
Single Player, Deterministic Games
Two-Player, Deterministic, Zero-Sum
Games

Zero-sum: one player’s gain (or loss) of utility is
exactly balanced by the losses (or gains) of the
utility of other player(s)
 E.g., chess, checkers, rock-paper-scissors, …
Two-Player, Deterministic, Zero-Sum
Games






𝑆0 : the initial state
𝑃𝑙𝑎𝑦𝑒𝑟(𝑠): defines which player has the move in a
state
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠): defines the set of legal moves
𝑟𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎): the transition model that defines the
result of the move
𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙_𝑡𝑒𝑠𝑡(𝑠): returns true if the game is over. In
that case 𝑠 is called a terminal state.
𝑢𝑡𝑖𝑙𝑖𝑡𝑦(𝑠, 𝑝): a utility function (objective function) that
defines the numeric value of the terminal state for
player 𝑝
Minimax
Game tree (2-player, deterministic,
turns)
Minimax
Minimax


“Perfect play” for deterministic games
Idea: choose move to position with highest minimax value
= best achievable payoff against best play
Is minimax optimal?

Depends
 If opponent is not rational could be a better play

Yes
 With assumption both players always make
best move
Properties of minimax

Complete?


Space complexity?


O(bd) (depth-first exploration)
Optimal?


Yes (if tree is finite)
Yes (against an optimal opponent)
Time complexity?

O(bd)
For chess, b ≈ 35, d ≈100 for
"reasonable" games
≈ 10154
 exact solution completely infeasible
How to handle suboptimal opponents?

Can build model of opponent behavior
 Use that to guide search rather than MIN

Reinforcement learning (later in the semester)
provides another approach
α-β pruning

Do we need to explore every node in the search
tree?

Insight: some moves are clearly bad choices
α-β pruning example
α-β pruning example
What is the value of this node?
And this one?
First option is worth 3, so root is at least
that good
Now consider the second option
What is this node worth?
At most 2
But, what if we had these values?
1
99
It doesn’t matter, they won’t make
any difference so don’t look at
them.
α-β pruning example
α-β pruning example
α-β pruning example
Why didn’t we check this node first?
Properties of α-β

Pruning does not affect final result


i.e. returns the same best move
(caveat: only if can search entire tree!)

Good move ordering improves effectiveness of pruning

With "perfect ordering," time complexity = O(bm/2)

Can come close in practice with various heuristics
Bounding search

Similar to depth-limited search:
 Don’t have to search to a terminal state, search to some
depth instead
 Find some way of evaluating non-terminal states
Evaluation function

Way of estimating how good a position is

Humans consider (relatively) few moves and don’t
search very deep
But they can play many games well

  evaluation function

is key
A LOT of possibilities for the evaluation function
A simple function for chess



White = 9 * # queens + 5 *# rooks + 3 * # bishops +
3 * # knights + # pawns
Black= 9 * # queens + 5 *# rooks + 3 * # bishops + 3
* # knights + # pawns
Utility= White - Black
Other ways of evaluating a game
position?

Features:
 Spaces you control
 How compressed your
pieces are
 Threat-To-You – Threat-To-Opponent
 How much does it restrict opponent options
Interesting ordering
Game
Branching factor
Computer quality
Go
360
<< human
Chess
35
≈ human
Othello
10
>> human
Implications
Game
Branching factor
Computer quality
Go
360
<< human
Chess
35
≈ human
Othello
10
>> human
• Larger branching factor  (relatively) harder
for computers
• People rely more on evaluation function than on
search
Deterministic games in practice

Othello: human champions refuse to compete against
computers, who are too good.

Chess: Deep Blue defeated human world champion
Garry Kasparov in a six-game match in 1997.

Checkers: Chinook ended 40-year-reign of human
world champion Marion Tinsley in 1994. In 2007
developers announced that the program has been
improved to the point where it cannot lose a game.

Go: human champions refuse to compete against
computers, who are too bad.
More on checkers

Branching factor
Computer quality
Go
360
<< human
Chess
35
≈ human
Othello
10
>> human
Checkers has a branching factor of 10


Game
Why isn’t the result like Othello?
Complexity of imagining moves: a move can change a
lot of board positions

A limitation that does not affect computers
Summary

Games are a core (fun) part of AI
 Illustrate several important points about AI
 Provide good visuals and demos

Turn-based games (that can fit in memory) are well
addressed

Make many assumptions (optimal opponent, turnbased, no alliances, etc.)
Questions?