CSC321 Lecture 2 A Simple Learning Algorithm : Linear Regression Roger Grosse and Nitish Srivastava January 7, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 1 / 20 Outline In this lecture we will See a simple example of a machine learning model, linear regression. Learn how to formulate a supervised learning problem. Learn how to train the model. It’s not a neural net algorithm, but it will provide a lot of useful intuition for algorithms we will cover in this course. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 2 / 20 A Machine Learning Problem Suppose we are given some data about basketball players 30 6.8 6.3 6.4 6.2 .. . 25 Avg points scored per game Height in feet Avg Points Scored Per Game 9.2 11.7 15.8 8.6 .. . 20 15 10 5 5.8 6.0 6.2 6.4 6.6 6.8 Height in feet 7.0 7.2 7.4 7.6 What is the predicted number of points scored by a new player who is 6.5 feet tall ? Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 3 / 20 Formulate as a Supervised Learning Problem We are given labelled examples (the training set): Inputs: {x (1) , x (2) , . . . , x (N) } - Height in feet. Targets: {t (1) , t (2) , . . . , t (N) } - Avg points scored per game. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 4 / 20 Formulate as a Supervised Learning Problem We are given labelled examples (the training set): Inputs: {x (1) , x (2) , . . . , x (N) } - Height in feet. Targets: {t (1) , t (2) , . . . , t (N) } - Avg points scored per game. Choose a model ≡ Make an assumption about the data’s behaviour. Let’s say we choose y = wx + b Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 4 / 20 Formulate as a Supervised Learning Problem We are given labelled examples (the training set): Inputs: {x (1) , x (2) , . . . , x (N) } - Height in feet. Targets: {t (1) , t (2) , . . . , t (N) } - Avg points scored per game. Choose a model ≡ Make an assumption about the data’s behaviour. Let’s say we choose y = wx + b We call w the weight and b the bias. These are the trainable parameters. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 4 / 20 Formulate as a Supervised Learning Problem We are given labelled examples (the training set): Inputs: {x (1) , x (2) , . . . , x (N) } - Height in feet. Targets: {t (1) , t (2) , . . . , t (N) } - Avg points scored per game. Choose a model ≡ Make an assumption about the data’s behaviour. Let’s say we choose y = wx + b We call w the weight and b the bias. These are the trainable parameters. Learning: Extract knowledge from the data to learn the model. {x (1) , x (2) , . . . , x (N) } {t (1) , t (2) , . . . , t (N) } Roger Grosse and Nitish Srivastava w, b CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 4 / 20 Formulate as a Supervised Learning Problem We are given labelled examples (the training set): Inputs: {x (1) , x (2) , . . . , x (N) } - Height in feet. Targets: {t (1) , t (2) , . . . , t (N) } - Avg points scored per game. Choose a model ≡ Make an assumption about the data’s behaviour. Let’s say we choose y = wx + b We call w the weight and b the bias. These are the trainable parameters. Learning: Extract knowledge from the data to learn the model. {x (1) , x (2) , . . . , x (N) } {t (1) , t (2) , . . . , t (N) } w, b Inference: Given a new x and the learned model, make a prediction y . x Roger Grosse and Nitish Srivastava w, b y CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 4 / 20 Learning Design an objective function (or loss function) that is Minimized when the model does what you want it to do. Easy to minimize (smooth, well-behaved). Here we want y = wx + b to be close to t, for every training case. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 5 / 20 Learning Design an objective function (or loss function) that is Minimized when the model does what you want it to do. Easy to minimize (smooth, well-behaved). Here we want y = wx + b to be close to t, for every training case. Therefore one choice could be, N L(w , b) = 1X (wx (i) + b − t (i) )2 2 i=1 This is called squared loss. Need to find w , b such that L(w , b) is minimized. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 5 / 20 Learning L(w , b) = 1X (wx (i) + b − t (i) )2 2 i Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 6 / 20 Learning 1X (wx (i) + b − t (i) )2 2 i X = (wx (i) + b − t (i) )x (i) L(w , b) = ∂L ∂w ∂L ∂b Roger Grosse and Nitish Srivastava i = X wx (i) + b − t (i) i CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 6 / 20 Learning 1X (wx (i) + b − t (i) )2 2 i X = (wx (i) + b − t (i) )x (i) L(w , b) = ∂L ∂w ∂L ∂b i = X wx (i) + b − t (i) i Since L is a nonnegative quadratic function in w and b, any critical point is a minimum. Therefore, we minimize L by setting ∂L ∂L = 0, = 0. ∂w ∂b Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 6 / 20 Learning ! w X x (i) · x (i) ! X +b i x (i) ! − X i t (i) x (i) ! w X i Roger Grosse and Nitish Srivastava x =0 i (i) ! + bN − X t (i) =0 i CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 7 / 20 Learning ! w X x (i) · x (i) ! X +b i x (i) ! − X i t (i) x (i) ! w X i x =0 i (i) ! + bN − X t (i) =0 i Now we have 2 linear equations and 2 unknowns w and b. Solve! Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 7 / 20 Inference 30 25 20 15 10 5 5.8 Roger Grosse and Nitish Srivastava 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 8 / 20 Inference 30 25 20 15 10 5 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 To make a prediction about a new player, just use y = wx + b. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 8 / 20 Multi-variable Linear Regression Multi-variable : Instead of x ∈ R, we have x = (x1 , x2 , . . . , xM ) ∈ RM . Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 9 / 20 Multi-variable Linear Regression Multi-variable : Instead of x ∈ R, we have x = (x1 , x2 , . . . , xM ) ∈ RM . For example, Height in feet (x1 ) 6.8 6.3 6.4 6.2 .. . Weight in pounds (x2 ) 225 180 190 180 .. . Roger Grosse and Nitish Srivastava Avg Points Scored Per Game (t) 9.2 11.7 15.8 8.6 .. . CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 9 / 20 Multi-variable Linear Regression Multi-variable : Instead of x ∈ R, we have x = (x1 , x2 , . . . , xM ) ∈ RM . For example, Height in feet (x1 ) 6.8 6.3 6.4 6.2 .. . Weight in pounds (x2 ) 225 180 190 180 .. . Avg Points Scored Per Game (t) 9.2 11.7 15.8 8.6 .. . Choose a model y = w1 x1 + w2 x2 + . . . + wM xM + b = w> x + b Parameters to be learned : w, b. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 9 / 20 Multi-variable Linear Regression Multi-variable : Instead of x ∈ R, we have x = (x1 , x2 , . . . , xM ) ∈ RM . For example, Height in feet (x1 ) 6.8 6.3 6.4 6.2 .. . Weight in pounds (x2 ) 225 180 190 180 .. . Avg Points Scored Per Game (t) 9.2 11.7 15.8 8.6 .. . Choose a model y = w1 x1 + w2 x2 + . . . + wM xM + b = w> x + b Parameters to be learned : w, b. Objective function L(w, b) = N X (w> x(i) + b − t (i) )2 i=1 Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 9 / 20 Multi-variable Linear Regression We can use more general basis functions (also called “features”). y = w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x) Parameters to be learned : w. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 10 / 20 Multi-variable Linear Regression We can use more general basis functions (also called “features”). y = w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x) Parameters to be learned : w. For example, 1-D Polynomial fitting φ0 (x) = 1 φ1 (x) = x φ2 (x) = x 2 φ3 (x) = x 3 .. . . = .. φM (x) = x M =bias z }| { y = w0 φ0 (x) +w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x) Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 10 / 20 Multi-variable Linear Regression We can use more general basis functions (also called “features”). y = w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x) Parameters to be learned : w. For example, 1-D Polynomial fitting φ0 (x) = 1 φ1 (x) = x φ2 (x) = x 2 φ3 (x) = x 3 .. . . = .. φM (x) = x M =bias z }| { y = w0 φ0 (x) +w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x) Note : Linear regression means linear in parameters w, not linear in x. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 10 / 20 Learning Multi-variable Linear Regression Feature matrix: Vector of predictions: Φ(x (1) )> Φ(x (2) )> Φ= .. . wT Φ(x (1) ) wT Φ(x (2) ) Φw = .. . Φ(x (N) )> wT Φ(x (N) ) Objective function L(w) = 1X T (w Φ(x (i) ) − t (i) )2 2 i 1 = ||Φw − t||2 2 Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 11 / 20 Learning Multi-variable Linear Regression Optimum occurs where ∇w L(w) = Φ> (Φw − t) = 0 Therefore, Φ> Φw − Φ> t = 0 −1 w = Φ> Φ Φ> t Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 12 / 20 Learning Multi-variable Linear Regression Optimum occurs where ∇w L(w) = Φ> (Φw − t) = 0 Therefore, Φ> Φw − Φ> t = 0 −1 w = Φ> Φ Φ> t Question : When will Φ> Φ be invertible ? Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 12 / 20 Fitting polynomials 1 t 0 −1 0 x 1 -Pattern Recognition and Machine Learning, Christopher Bishop. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 13 / 20 Fitting polynomials y = w0 M =0 1 t 0 −1 0 x 1 -Pattern Recognition and Machine Learning, Christopher Bishop. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 14 / 20 Fitting polynomials y = w0 + w1 x M =1 1 t 0 −1 0 x 1 -Pattern Recognition and Machine Learning, Christopher Bishop. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 15 / 20 Fitting polynomials y = w0 + w1 x + w2 x 2 + w3 x 3 M =3 1 t 0 −1 0 x 1 -Pattern Recognition and Machine Learning, Christopher Bishop. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 16 / 20 Fitting polynomials y = w0 + w1 x + w2 x 2 + w3 x 3 + . . . + w9 x 9 M =9 1 t 0 −1 0 x 1 -Pattern Recognition and Machine Learning, Christopher Bishop. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 17 / 20 Model selection Underfitting : The model is too simple - does not fit the data. M =0 1 t 0 −1 0 x 1 Overfitting : The model is too complex - fits perfectly, does not generalize. M =9 1 t 0 −1 0 Roger Grosse and Nitish Srivastava x 1 CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 18 / 20 Model selection Need to select a model which is neither too simple, nor too complex. M =3 1 t 0 −1 0 x 1 Later in this course, we will see talk more about controlling model complexity. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 19 / 20 Next class Another machine learning model, an early neural net : Perceptron. - Frank Rosenblatt, with the image sensor (left) of the Mark I Perceptron40 Reminder - Do the quizzes for video lectures A and B by 11.59pm next Monday. Roger Grosse and Nitish Srivastava CSC321 Lecture 2A Simple Learning Algorithm : Linear Regression January 7, 2015 20 / 20
© Copyright 2024