Swarm Intelligence for Medical Treatment Optimisation Prof. John McCall

Swarm Intelligence for Medical
Treatment Optimisation
Prof. John McCall
Agenda
•
•
•
•
•
Concepts of swarm intelligence
Particle Swarm Optimisation for cancer chemotherapy design
Medical Data Modelling and metaheuristic approaches
Ant Colony Optimisation for medical data modelling
Open Questions
Natural Swarms
•
•
•
Large numbers of living organisms acting together as a group
– Individual actions, social communication
Many and varied natural examples:
– Flocking and migrating birds
– Colonies of bees, wasps, hornets, ants and termites
– Swarming insects such as bees and locusts
– Schooling fish
– Herd animals (migration, protection and stampedes)
– Football crowds (Mexican wave), peace marchers, riots
Powerful Emergent behaviour
– Swarm has capabilities far beyond those of the individual
– In particular, can exhibit mass problem-solving intelligence
The Naked Jungle (1954)
Them (1954)
Piranha (1978)
The Birds (1963)
(1981)
Phase IV (1974)
Swarm intelligence metaheuristics
Sample search space
Swarm Activity
Social interaction
Agenda
•
•
•
•
•
Concepts of swarm intelligence
Particle Swarm Optimisation for cancer chemotherapy design
Medical Data Modelling and metaheuristic approaches
Ant Colony Optimisation for medical data modelling
Open Questions
Particle Swarm Optimisation (PSO)
•
•
•
•
•
swarm of individual “particles”
– distributed in a search space
– each located at a solution
particles move
– each particle has a velocity
– each moves to a new solution at
each step
particles communicate
– share information about good
locations
memory
– memory of previous good
locations
particles swarm towards good
solutions
Particle Swarm Optimisation
• each particle has a:
– position, x (solution currently being examined)
– velocity, v (direction and speed of motion)
– memory of “best” position wrt objectives
• swarm remembers best known position(s)
• Swarm moves synchronously over a series of
timesteps
• Each particle updates x and v based on current
knowledge
The PSO metaheuristic
Evaluate current positions
Change velocities
and positions
Update and exchange
information
particle swarm optimisation
1.
2.
3.
4.
initialise swarm
random positions, zero velocities, best = initial
update the swarm
1. update the positions
2. evaluate current positions
3. update the memory (best positions)
4. update the velocities
if not stopping condition do 2.
stop and return population and memory
swarm update rules
update the velocities*



vik 1  wv ik  c1r1 xi*  xik  c2 r2 xi**  xik
particle best bias
inertia
global/nbd best bias
* velocities are typically “clamped” within maximum possible values
update the positions
k 1
i
x
k 1
i
 x v
k
i

swarm topologies
• global topology
– all particles intercommunicate
– the best position found by any particle is communicated to the entire
swarm
– gbest (position, value)
• neighbour topology
– particles communicate with neighbours
– best position known to neighbours is shared
– nbest (position, value)
swarm topologies
5
5
2
4
7
6
2
4
3
7
4
6
1
neighbour best topology
4
3
1
global best topology
particle swarm visualisation
• Java applet by Mark Sinclair
– code available for download
– PSO Java Applet v1.0 – November 2006
– http://uk.geocities.com/markcsinclair/pso.html
• Visualises a 2-D test function
– Schaffer F5 function
• global optimum marked by a blue cross
• current global best marked by a red dot
Swarm visualisation
Cancer Chemotherapy
•
•
•
•
Systemic treatment with toxic drugs
Often used in combination with surgery and radiotherapy
Attacks primary and secondary tumours
Also attacks healthy tissues, leading to toxic side effects
Chemotherapy Simulation
potency
response
dN
 f N    c ct N
dt
growth
plasma
conc.
pop.
size
Objectives of Cancer Chemotherapy

N(t)
Nmax

Nfinal
Ncure
T0
Tfinal
PST t
Minimise:
 final tumour size;
 overall tumour
burden (shaded);
 side effects.
Prolong the patient survival
time
Chemotherapy Constraints
•
Maximum instantaneous dose
•
Maximum cumulative dose

g1 (c)  Cmax j  Cij  0  i 1, n, j 1, d
•
Maximum permissible size of the tumour
•
Restriction on the toxic side-effects

n


g 2 (c)   Ccum j   Cij  0  j 1, d 
i 1



g3 (c)  N max  N (ti )  0  i 1, n

d


g 4 (c)   Cs-eff k  kj Cij  0  i 1, n, k 1, m 
j 1


Optimisation of Cancer Chemotherapy
• Two optimisation objectives
– tumour eradication
– prolongation of the patient survival time
• Decision vectors
are the concentration levels of anti-cancer drugs in the bloodplasma
C ij , i 1, n , j 1, d
• State Equation
n


   d
dN
    j  Cij H (t  ti )  H (t  ti 1 )
 N (t )   ln
dt
 N (t )  j 1 i 1


Experiments (Petrovski, Sudha, McCall 2004)
•
•
•
Search for feasible chemotherapies
– All constraints met
Compare Genetic Algorithm with
– PSO Global Best
– PSO Local Best
Desirable properties
– Find a feasible solution with 100% success rate
– Find a feasible solution with as few tumour simulations as possible
– Low variation in time taken to achieve satisfaction
PSO Parameters
Number of particles
N  50
Topologies
global
Initial velocities
Random in [0,2]
Inertia coefficient
Random. in [0.5, 1]
,
local with nbds of size 10
.
Social and cognitive components
c1 = c2 = 4
Velocity clamping
|vmax| <=1
Probability of finding feasible solution
Results
110
100
90
80
70
60
50
40
30
20
10
0
GA
PSO (Gbest)
PSO (Lbest)
0
100
200
300
Generations
400
500
600
Box plots of run length
600
550
500
29
450
400
350
300
250
200
150
100
13
28
50
0
-50
-100
N=
27
27
27
FGENSGA
FGENSGB
FGENNH
GA
G-best
L-best
Multi-objective optimisation
x2
f2(x)


f1(x)
x1
Decision variable space
Objective function space
maximise F (x )  ( f1 (x ), f 2 (x ),, f k (x ))T
subject t o G (x )  ( g1 (x ), g 2 (x ),, g m (x ))  0
Pareto Optimality
f2(x)
x2
Feasibility
region
Dominated
region
f1(x)
• Pareto dominance
x1
Non-dominated
region
x  x, i.e. x dominatesx, iff i  1, , k fi (x)  fi (x)  j  1, , k f j (x)  f j (x)
• Pareto optimality
x   is Pareto - optimal if f x is non - dominated, i.e.   x    x  x 
• Pareto optimal set (front)
Multi-objective PSO (MOPSO) ( Coello Coello, Lechuga
2004)
• Map particles onto objective
space
• Identify the non-dominated set
• Store these positions
• Assign other particles to these
leaders geographically
• Update particle veocities:
– Inertia component
– Cognitive component
– Social component based on
• Selection from repository
• Avoiding crowding
f2(x)
f1(x)
MOPSO for cancer chemotherapy optimisation
Petrovski, McCall, Sudha (2009)
Tumour reduction
1,20E-04
1,00E-04
8,00E-05
6,00E-05
4,00E-05
2,00E-05
0,00E+00
0
0,2
0,4 solutions0,6
0,8 MOPSO algorithm
1
A set of non-dominated
found by the
1,2
1,4
Patient
Survival
•
•
•
Good spread along Pareto front
High quality global best solution
Rapid discovery of feasible
solutions
Agenda
•
•
•
•
•
Concepts of swarm intelligence
Particle Swarm Optimisation for cancer chemotherapy design
Medical Data Modelling and metaheuristic approaches
Ant Colony Optimisation for medical data modelling
Open Questions
Prostate Cancer Management
•
•
•
•
Cancer of the prostate gland
Affects late middle-aged to elderly men
Second most common cause of cancer
death of men
Most common cancer in UK men
Prostate cancer patient pathway
•
•
Symptoms → Referral
Initial testing
–
–
•
Scan and Biopsy
–
•
Prostate Specific Antigen
Digital Rectal Examination
Gleason score
Treatment choices
–
–
–
–
Watchful waiting
Hormone therapy
Radiotherapy
Surgery
Uncertainties
•
•
•
•
Symptoms may be from benign conditions
– Benign prostatic hyperplasia, prostatitis
Invasive investigation has side effects
– Impotence, incontinence
Surgical /radiotherapy treatment
– Strong side effects on quality of life
– Success rates relatively low
Prostate cancer generally slow-growing
– Age of patient is important
– Is the cancer localised or metastatic?
Motivation
•
•
•
Model the prostate cancer patient pathway
Predict likely outcomes of decisions
Assist clinicians to:
– Formulate and recommend management strategies
– Explore decisions and consequences with patients
Bayesian Networks for Prostate Cancer Management
•
•
Bayesian Networks for medical applications
– Long history in expert systems (from 1970s)
– Good fit to probability-based medical decision-making
– Mainly used for diagnosis and prognosis
Prostate cancer modelling
– Statistical techniques (notably Partin tables)
– ANN approaches (e.g. Prostate Calculator – Crawford et. al.)
What is a Bayesian network?
•
•
•
•
A representation of the joint probability distribution of a set of random variables
Represents causal dependencies
Can be learned from a data set
Two components to learn:
– Structure
• Directed Acyclic Graph
– Parameters
Example - The Asia Network- Lauritzen & Spiegelhalter
(1988)
G= (V, E) Where:
Visit to Asia?
Smoking?
V: vertices represent variables of
interest
E: edges represent conditional
dependencies among the variables
Tuberculosis?
Lung Cancer?
Tuberculosis
or Lung Cancer?
X-Ray Result
Structure factorises the joint
probability distribution
Dyspnea?
n
P ( X 1 ,..., X n )   P X i | Pa( X i ) 
i 1
Bronchitis?
Parameters define conditional probabilities
Visit
No Visit
1.00%
99.0%
Visit to Asia?
Tuberculosis?
Smoking?
Smoker
50.00%
Non-Smoker
50.00%
Lung Cancer?
Tuberculosis
or Lung Cancer?
Tuberculosis
Present
Cancer
Present
Absent
Present
Absent
True
100.0%
100.0%
100.0%
0.00%
False
0.00%
0.00%
0.00%
100.0%
n
Absent
P ( X 1 ,..., X n )   P X i | Pa( X i ) 
i 1
BN Structure Learning
•
–
–
–
–
–
–
•
A
Number of possible networks grows super
exponentially with the number of variables
1 variable  1
2 variables  3
3 variables  25
4 variables  543
5 variables  29,281
6 variables  3,781,503
37 variables in prostate cancer data
A
A
A
A
B
B
B
B
A
A
A
B
A
A
C
B
C
B
C
B
A
C
A
A
B
C
B
C
B
C
Approaches to learning BN Structures
•
Dependency test based
– Conditional independency tests (CI)
– Edges correspond to correlations between the variables
– Example algorithms: PC, NPC..
•
Search and score based
– Local/ greedy search strategies
– Scoring metrics
Search and Score
•
•
Search through the space of possible networks scoring each network
The solution is the network which maximises the score
– Search strategies (greedy/local search/etc.)
– Scoring metrics - goodness of fit
• Maximum Likelihood
• BDe Metric. Computes the relative posterior probability of a network
structure given the data
• BIC (Bayesian Information Criterion). Coincides with the MDL score
BN Structure Learning
Local greedy search : K2 [Cooper & Herskovitz, 1992]
– Assume a node ordering
– Start with root node
– Add the parent set that maximizes the score
– K2 Score (CH) aims to maximize P(Structure|Data)
ri
( ri  1)!
P( Bs , D)  P( Bs )
N ij  N ijk !
i 1 j 1 ( N ij  ri  1)!
k 1
n
qi
EA for BN structure learning
• [Larrañaga] first to use Genetic Algorithm to learn BN structures
• [Wong] Hybrid Evolutionary Programming to discover BN structures
• [van Dijk] Build a skeleton graph (undirected graph) using CI (X2) test.
Then GA to turn the skeleton into a DAG by evolving a population of
DAG’S using skeleton as template (Repair operator for illegal structures)
• [Habrant] Application to time series prediction in finance . Used K2 score,
with and without ordering assumption.
• [Novobilski] Establish a population of legally fixed length encoded DAG’s
Chain-Model GA (Kabli, McCall, Herrmann 2007)
Population
Evaluate
If fitter than
worst
individual
Insert in
population
X1X2X3X4
X2X3X1X4
X1X4X3X2
X1
X2
X3
X4
X2
X3
X1
X4
X1
X4
X3
X2
score
data
assign fitness
Selection
Crossover
One
offspring
Breed
Mutation
End of Evolution
X1X2X3X4
X2X3X1X4
X1
K2 Search
data
X1
X3
X2
X4
X3
X2
X4
Prostate Cancer Data
•
•
•
Patients treated at Aberdeen Royal Infirmary
Retrospective data
– 320 patients diagnosed and treated for prostate cancer
– 2.5 year period 2002-2004
Data Collection
– Selection of representative samples by clinician
– Data from manual patient records collected in a bespoke database
– Overall 37 patient factors collected
– Data was discretised using medical knowledge
BN structure learned from PC data
Applications
•
•
•
•
•
•
Scanning and biopsy decision
– Uncomfortable process
– BN helps assess value of scan and biopsy
Treatment choice
– Surgery probability of success vs after-care life
Hospital resource planning
– Sample model to predict future demand
– Explore policy changes
Patient disease education
– Allow patient to explore what disease means for them
Pathological staging
– Partin tables
Follow-on £50k project funded by NHS Grampian / NRP in association with BAUS
Agenda
•
•
•
•
•
Concepts of swarm intelligence
Particle Swarm Optimisation for cancer chemotherapy design
Medical Data Modelling and metaheuristic approaches
Ant Colony Optimisation for medical data modelling
Open Questions
natural inspiration for ant colony optimisation
• ant colony behaviour
– foraging for food
• stigmergy
– ants communicate indirectly
through the environment
– pheromone (scent trails) laid down
as ants move
• problem solving potential
– ant colonies can find shortest
paths to food
ant colony intelligence
• foraging behaviour
– many distinct individuals solving a common problem
– accumulation of pheromone biases future search
• search properties
– the ants explore many paths in the early stages
– the ants converge on a common path as evidence builds up
Which search spaces?
•
•
•
Construction spaces
– Each ant constructs a solution
– Series of construction steps
Ants select steps based on pheromone amounts
Pheromone deposited backwards along ant path
– Amount depends on solution evaluation
– Pheromone evaporates over time
Example: Travelling Salesman Problem
Find the shortest path needed to
visit all of n cities precisely once,
then returning to the starting city.
Solution: an ordering in which to
visit the cities
Evaluation: the length of the route
corresponding to the ordering
Solution construction by ants
•
•
•
each ant has a:
– partial solution, s (solution
under construction)
– choice of next construction step
– knowledge of pheromone
associated with each step
ants chooses next step
probabilistically
– more pheremone increases
chance of selecting step
pheromone dropped equally along
solution steps once full solution is
evaluated
notation
m
s(i, j )
 ij

Tk
Ck
 ijk
number of ants in colony
step (i, j) in construction space
pheromone on step (i,j)
pheromone evaporation rate
solution path constructed by ant k
cost of solution constructed by ant k
pheromone dropped on step(i,j) by ant k
pheromone update rules
1.
2.
evaporate
calculate new pheromone
 ij  (1   ) ij
 1
k
if
s
(i,
j)
belongs
to
T

 ijk   C k

0 otherwise
m
3.
distribute pheromone
 ij   ij    ijk
k 1
ant colony optimisation
1.
initialise ant colony
set parameters, initialise pheromone
2.
while (stopping criterion not met) {
1. construct ants solutions
2. update pheromones
}
3. stop: return best solution and pheromones
ACO for Bayesian Network Structure Learning
• Growing research area
• ACO-B (L. de Campos 2002)
– ants construct DAGs then hill climb best solutions
– Ants construct orderings of nodes and hill climb
• ACO-E (Daly et. al. 2006, 2009)
– search space of DAG equivalence classes
– Different operators added to extend possible construction steps
• MMACO (P. Pinto 2008)
– Ants used to refine skeleton BN constructed by MMPC
ChainACO, K2ACO (Wu, McCall, Corne – 2010)
• chainACO
– Ants construct orderings
– Orderings are evaluated as chains using CH score
– Phase I ends with orderings producing best chains
– Phase II runs K2 on best orderings
• K2ACO
– Ants construct orderings
– K2 constructs BN on ordering.
– BNs evaluated by CH score
• Currently being evaluated on:
– Benchmarks (ASIA, CAR, ALARM, …)
– Medical data
Agenda
•
•
•
•
•
Concepts of swarm intelligence
Particle Swarm Optimisation for cancer chemotherapy design
Medical Data Modelling and metaheuristic approaches
Ant Colony Optimisation for medical data modelling
Open Questions
Open Questions
• Accuracy / computational cost trade-offs
– How much data is enough?
– When does it pay off to use cheap evaluation?
• algorithms vs structures
– How do particular metaheuristics interact with problem structure
– Network topology
– Inductive bias
• more nodes, more data
– How can these methods scale effectively?
Thank you!