Practical course “Bayesian Optimization” (also known as “Parameter Tuning and Algorithm Configuration”) Lectures: Frank Hutter Exercises & Supervision of Projects: Katharina Eggensperger, Stefan Falkner, Matthias Feurer, Aaron Klein, Marius Lindauer Overview of Today • • • • Practical information Learning goals Bayesian optimization in a nutshell Hands-on sessions – First steps (Katharina) – Setting up virtual box (Stefan) – Plotting (Aaron) 2 Practical Information • Room: Building 074, MST Pool • Time: Mondays, 14:15-15:45 • Weeks 1-6: Basics (implement everything from scratch) – Reading, a few lectures, and implementing what you learned – Exercise sheets in weeks 1-4, each with a 2-week deadline. Exercises count 50% of the grade. • Weeks 7-14: Project (implement research from a paper) – – – – Week 7: overview of papers Week 9: Short student presentations on one paper each Week 14: Hand in project code & short report; student presentations. Project counts 50% of the grade (equal parts for paper presentation, project code, project presentation, project report) 3 Learning Goals • After this course, you can … – Derive & implement Gaussian process regression from scratch – Derive & implement Bayesian optimization from scratch – Implement a new facet of Bayesian optimization from a research paper – Effectively use Python to program mathematical software, carry out experiments with it and plot their results – Effectively use state-of-the-art hyperparameter optimization methods • You will also practice various soft-skills: – Short Presentations, team work, report writing, ... 4 Bayesian Optimization in a Nutshell • Prominent approach to optimize expensive blackbox functions [Mockus et al., '78] max f(x) x X x f(x) • Efficient in the number of function evaluations • Works when objective is nonconvex, noisy, has unknown derivatives, etc • Recent convergence results [Srinivas et al, '10; Bull '11; de Freitas, Smola, Zoghi, '12] 5 Why is Bayesian Optimization Interesting? • Currently leading approach for hyperparameter optimization in machine learning • Can be used for automatic machine learning – Feature Selection – Selection of Machine Learning Algorithm – Hyperparameter Optimization  Effective machine learning off-the-shelf • Underlying approach for general algorithm configuration – Adjust algorithm parameters to gain speed, accuracy, etc – If you have not encountered this problem, trust me: you will  6 Coming up next … • The first exercise sheet is out today (due in 2 weeks) • Next Monday: lecture on Gaussian processes • I will follow the excellent book “Gaussian Processes for Machine Learning” by Carl Edward Rasmussen and Christopher K. I. Williams. • Before the lecture, read sections 2.0-2.3 (15 pages) in the book. A free copy is available online: http://www.gaussianprocess.org/gpml/chapters/RW2.p df 7