A New Propagator for Two-Layer Neural Networks in Empirical Model Learning Michele Lombardi (University of Bologna) Stefano Gualandi (University of Pavia) Context Sometimes, real-world optimization problems are defined over complex domains ...and complex domains are difficult to model Context Example #1 § Given a budget § Place traffic lights § Optimize a traffic metric § How do you compute this metric? § How is it affected by the decisions? Context § How will people react to the plan? Example #2 § Given fixed budget § Design an incentive plan § Renewable energy production goals Context Example #3 § Many-core platform § Dispatch workload § On-line scheduling § Avoid loss of efficiency due to thermal controllers § How does the temperature behave? § How is it affected by the scheduler? Empirical Model Learning Empirical Model Learning is a technique to enable optimal decision making over complex systems Three main steps: 1 Obtain input/output tuples 2 Extract an approximate model via Machine Learning 3 Encode it using a Combinatorial Optimization technique How? Simple evaluation? More? Neuron Constraints Specific case: Neural Networks in0 in1 in2 out y= X i Encode each neuron using CP! x0 x1 x2 X w i · xi + b y f z § monotone § non-decreasing e.g. Neuron Constraints Specific case: Neural Networks in0 in1 out x0 x1 x2 X y f z in2 decision variables constraints Neuron Constraint Every NN can be encoded via Neuron Constraints Drawbacks hidden layer However, consider this: 1 x0#∈#[%1,1] %1 n0 1 n2 1 x1#∈#[%1,1] § Weights on the arcs § No bias § n0, n1: sigmoind. n2: linear. 1 n1 1 output layer Drawbacks 1 x0#∈#[%1,1] However, consider this: %1 n0 1 n2 1 x1#∈#[%1,1] 1 n1 1 Propagation: For n0 For n1 For n2 x0 ∈ [-‐1,1] x1 ∈ [-‐1,1] x0 ∈ [-‐1,1] x1 ∈ [-‐1,1] [...,0.96] [...,0.96] x0 = 1 x1 = 1 x0 = -‐1 x1 = 1 0.96 0.96 X [...,2] [...,0.96] X [...,2] [...,0.96] X [...,1.93] [...,1.93] Drawbacks The real network maximum is 1.51 § Instead of 1.93 :-( What to do? Build a global constraint Let’s start from a common case (2 layers): twolayerANN ([xi ], z, [bj ], [wj,i ], ˆb, [w ˆj ]) The Bounding Problem § Given bounds on the network input § How do we compute bounds on the network output? General Structure: x0 wj,i x1 X y0 X y1 f w ˆ0 b0 b1 f w ˆ1 X z f ˆb § monotone § non-decreasing The Bounding Problem § Given bounds on the network input § How do we compute bounds on the network output? Upper bound = solution of: max z = ˆb + m X1 all its fault! w ˆj f (yj ) j=0 s.t. yj = bj + n X1 wj,i xi i=0 xi 2 [xi , xi ] BUT: this is non-linear AND non-convex 8j =Lagrangian 0..m 1 A way out: relaxation 8i = 0..n 1 The Bounding Problem § Given bounds on the network input § How do we compute bounds on the network output? Upper bound = solution of: max z = ˆb + m X1 w ˆj f (yj ) relax this j=0 s.t. yj = bj + n X1 wj,i xi i=0 xi 2 [xi , xi ] BUT: this is non-linear AND non-convex 8j =Lagrangian 0..m 1 A way out: relaxation 8i = 0..n 1 The Bounding Problem § Given bounds on the network input § How do we compute bounds on the network output? Upper bound = solution of: max z( ) = ˆb + m X1 Lagrangian w ˆj f (yj )+ multipliers j=0 + m X1 j=0 xi 2 [xi , xi ] yj 2 [y j , y j ] j bj + n X1 wj,i xi yj i=0 8i = 0..n 8j = 0..m 1 1 ! The Bounding Problem For every value of the multipliers we get a bound (over)simplified example: z = f (y) y =2·x+1 x 2 [x, x] our z z feasible space x y The Bounding Problem For every value of the multipliers we get a bound (over)simplified example: z = f (y) y =2·x+1 x 2 [x, x] y 2 [y, y] z feasible space x y The Bounding Problem For every value of the multipliers we get a bound (over)simplified example: z = f (y) + · (2 · x + 1 y =2·x+1 y) z x 2 [x, x] y 2 [y, y] x y The Bounding Problem For every value of the multipliers we get a bound (over)simplified example: z = f (y) + · (2 · x + 1 y =2·x+1 y) z x 2 [x, x] y 2 [y, y] x y Solving the Relaxation § Given bounds on the network input § How do we compute bounds on the network output? The problem is Upper bound = solution of: separable max z( ) = ˆb + m X1 w ˆj f (yj )+ y-‐part j=0 + x-‐part m X1 j=0 xi 2 [xi , xi ] yj 2 [y j , y j ] j bj + n X1 wj,i xi yj i=0 8i = 0..n 8j = 0..m 1 1 ! Solving the Relaxation The x-part: max zx ( ) = x n X1 i=0 s.t. 0 @ m X1 j wj,i j=0 xi 2 [xi , xi ] ≥ 0: maximize xi 1 A xi < 0: minimize xi 8i = 0..n § Linear problem with box constraints § O(nm) to compute the weights § O(n) to solve the problem 1 Solving the Relaxation The y-part: max zy ( ) = y (w ˆj f (yj ) j yj ) j=0 s.t. § § § § m X1 yj 2 [y j , y j ] 8j = 0..m 1 Sum of single-variable functions Optimum via classical analytic methods (derivative = 0) Best yj in constant time Solved in O(m) Finding the optimal multipliers The Lagrangian problem: Which multipliers provide the best upper bound? min z( ) s.t. 2R m § Solved via subgradient method + deflection Deflection: the update direction is a composition of the current subgradient and the last direction Finding the optimal multipliers A B min bound 1 min bound first iteration first iteration 0 0 No deflection With deflection z⇤( ) Finding the optimal multipliers A B min bound 1 min bound first iteration first iteration 0 0 No deflection The best bound is 1.52! Real: 1.51, 1.93 With Prev: deflection z⇤( ) Finding the optimal multipliers The Lagrangian problem: Which multipliers provide the best upper bound? min z( ) s.t. 2R m § Solved via subgradient + deflection Deflection: the update direction is a composition of the current subgradient and the last direction § 100 iterations at the root node § During search: 3 iterations per x-update Experimental Setup Implemented in Google or-tools Test on a workload dispatching problem: § 16-20 tasks, 4 cores § 6 platforms § Thermal controller (lowers efficiency) § Efficiency threshold, find feasible assignment § Time Limit Static var/val choice heuristic: the structure of the search tree does not depend on the propagation Branches for 16 Tasks ptf1 #branches NO LAG ptf2 ptf3 #branches LAG Time for 16 Tasks ptf1 ptf2 Time NO LAG ptf3 Time LAG Branches for 20 Tasks ptf1 #branches NO LAG ptf2 ptf3 #branches LAG Time for 20 Tasks ptf1 ptf2 Time NO LAG ptf3 Time LAG Conclusions A new propagator for an important Neural Network class: § Impressive reduction of the #branches (in some cases) § The propagation time is an issue Roadmap: § A more efficient way to update the multipliers § Prune the input variables § Comparison with MINLP solvers § Comparison with local search based solvers § Apply EML to different machine learning technologies The End Thanks! Questions?
© Copyright 2025