An Overview of the Classical Regression Model 1 Assumptions of the Classical Regression Model Assume we want to investigate the general relationship: yt = g(x1t, x2t,...,xkt|1,2,..,k,2) To use linear regression techniques for parameter estimation we need the following assumptions: A1- g(·) is Klinear in parameter vector xit i : E(yt)= i 1 A2- xit are non-stochastic variables A3- et, t = 1,2, …, T, are independently and identically distributed: K et yt xit i i 1 E(et) =E[yt-Xβ]= E[yt-E(yt)]=0 2 Assumptions of the Classical Regression Model (cont) A4- Homoskedastic error term: E[(e-E(e))2]=E(e2)=σ2IT → V(et) = 2, t = 1, …, T. A5- The determinant of (XX) is nonzero. The k exogenous variables are not perfectly collinear If A.5 fails → parameters can not be estimated T≥k, must be at least as many obs. as there are parameters to estimate |X'X| ≠0 but “very small”→ “collinearity problem”→ β’s can be estimated but the estimator is imprecise. Why? Var(βs)=σ2(XX)-1 3 Assumptions of the Classical Regression Model (cont) If we want to use Maximum Likelihood Techniques to obtain parameter estimates we need the following: A6- Normality assumption: et N(0, 2), t = 1, …, T 4 Overview of the Classical Regression Model Although a linear model in terms of the parameters we can allow for a nonlinear relationship between exogenous and dependent variables via the use of alternative functional forms or variable construction (Stewart, Ch. 6; Greene Section 7.3, 124-130) 4 RHS variables Example: Let Xt=[1 zt ln(zt) (w2tzt)] (4 x 1) yt= [1 zt ln(zt) (wt2zt)]β + et yt a linear function of the βi’s yt a non-linear function of explanatory variables (e.g., marginal effects) 5 Overview of the Classical Regression Model yt= [1 zt ln(zt) (wt2zt)]β + et Marginal effects can be represented as: ∂yt/∂zt=β2+β3/zt+β4wt2 ∂yt/∂wt=2β4wtzt Note nonlinear marginal effects with respect to exogenous variables 6 Overview of the Classical Regression Model Yt = 1 + (1/Xt)2 + et (with 1>0, 2<0) Yt β1 Yt = 1 E[Yt] = 1 + (2/Xt) dYt/dXt= -2/(Xt2) Xt -2/1 yx= Elasticity of Y wrt X = [dYt/dXt] Xt/Yt= [-2/(Xt2)]Xt/Yt = -2/(XtYt) 7 Overview of the Classical Regression Model Yt error term Yt =aXt1exp(et) ln(Yt) = 0 + 1ln(Xt) + et 0=ln(a), 0 < 1 < 1, a = exp(0) Elas. Y wrt X≡dlnYt/dlnXt = 1 E ln Yt 0 1 ln X t E(Yt ) aXt 1 exp 2 2 E exp et 2 exp 2 1 0 Xt Sometimes referred to as the CobbDouglas functional form w/more than 1 exogenous variable 8 Classical Regression x is (1 x K) Model Summary t β is (K x 1) K y t = E y t + e t = β0 + β k x kt + e t k=2 x tβ + et where xt is non-stochastic and not all identical conditional mean E et 0 E y t = x tβ Var e t Var(y t ) σ 2 Cov e t ,es Cov y t , ys 0 for t s e t ~ 0,σ 2 and y t ~ x tβ,σ 2 If et, yt are normally distributed: → et~N(0,σ2) yt~N(xtβ, σ2) 9 Example of Food Expenditures 40-Households Weekly food exp. and income data Food Expenditures vs. Income 60 Food Expenditures 50 40 30 20 10 0 0 20 40 60 80 Household Income 100 120 140 10 Example of Food Expenditures 40-Households Suppose we have two different estimators to obtain estimates of β0, β1 Food Expenditures vs. Income 60 Food Expenditures 50 40 yt = 0 0+ 1 0xt 30 yt = 01 + 11 xt t 20 10 0 0 20 40 60 80 Household Income 100 120 140 Which estimator is “preferred” ? 11 Our Classical Regression Model Given the above data and theoretical model we need to obtain parameter estimates to place the mean expenditure line in expenditure/income (X) space Would expect the position of the line to be in the middle of the data points What do we mean by middle as some et’s are positive and others negative We need to develop some rule to locate this line 12 Our Classical Regression Model Least Squares Estimation rule Chose a line such that the sum of the squares of the vertical distances from each point to the line defined by the coefficients (SSE) be as small as possible The above line refers to E(yt) SSE= Σtet2= Σt(yt-β0-β1xt)2 et Graphically, in the above scatter plot, the vertical distance from each point to the line representing the above linear relationship are called residuals or regression errors 13 Our Classical Regression Model yt = E yt + et = β0 + β1x t + et Let ˆ , ˆ be an initial “guess” of the intercept and slope coefficients 0 1 eˆ t = y t - yˆ t = y t - βˆ 0 - βˆ 1x t initial error term “guess” negatuve of initial conditional mean “guess” 14 Our Classical Regression Model y t = yˆ t + eˆ t = bˆ 0 + bˆ1x t + eˆ t y3 yt eˆ 3 = y3 - yˆ 3 ŷ 3 conditonal mean ŷ t = bˆ 0 + bˆ1X t ŷ 4 eˆ 4 = y 4 - yˆ 4 β̂0 X3 y4 X4 T S º SSE = å 2 T (y t - yˆ t ) = å eˆ 2t t= 1 Xt t= 1 15 Our Classical Regression Model Note that the SSE’s can be obtained via the following: (1xT) (Tx1) e1 SSE=e'e e eT T x1 (1x1) 16 Our Classical Regression Model Naïve Model ˆ e*t y t = m+ y t = yˆ t + eˆ t ŷ t = bˆ 0 + bˆ1X t y3 yt eˆ 3 = y3 - yˆ 3 ŷ 3 e*3 = y3 -μˆ ̂ ˆ ˆ e*e* Note: e'e SSE SSE * ŷ 4 e*4 = y4 -μˆ eˆ 4 = y 4 - yˆ 4 y4 Xt 17 Our Classical Regression Model Under the least squares estimation rule we want to choose the value of β0 and β1 to minimize the error sum of squares (SSE) Can we be assured that whatever values of β0 and β1 we choose, they do indeed minimize the SSE? 18 T y t = 1 + 2 x t + et SSE yt yˆt Our Classical 2 t 1 Regression Model 19 Our Classical Regression Model Lets look at the FOC for the minimization of the sum of squared errors (SSE) as a means of obtaining estimates of β1,β2 (2 x 2) β1s -1 βS = = X X X' Y β 2s (2xT)(Tx2) (2xT) (Tx1) (2 x 1) (2 x T) (2 x 1) T ˆ = eˆ 2 eˆ eˆ SSE t t=1 (1 x 1) ˆ where e=y-Xβ S Estimated value 20 Our Classical Regression Model Can we be assured that the SSE function is convex not concave wrt β’s? SSE YY 2βXY βXXβ β β 2XY 2XXβ The matrix of second derivatives of SSE with respect to β1 and β2 can be shown to be: HSSE 2T 2SSE 2XX 2 β 2Tx 2 2Tx 2 T 2 2 x 2t t=1 HSSE must be positive definite for convexity To be positive definite, every principle minor of HSSE needs to be postive 21 Our Classical Regression Model HSSE 2T 2SSE 2XX 2 β 2Tx 2 2Tx 2 T 2 2 x 2t t=1 The two diagonal elements must be positive T HSSE 4T x22t T x22 t 1 T 4T x2t x22 t 1 2 |HSSE| is positive unless all values of x2 are the same →HSSE is positive definite → SSE convex wrt β’s 22 Our Classical Regression Model For our 40 HH Food Expenditure Data 1 25.83 1 34.31 X = 1 115.46 βs XX 1 9.46 10.56 Y 48.71 7.3832 XY = 0.2323 23 Our Classical Regression Model Food Expenditures vs. Income 60 Food Expenditures 50 dYt/dIt=0.2323 40 30 20 10 7.3832 0 0 20 40 60 80 100 120 140 Household Income 24 Sampling Properties of Estimated Coefficients “True” relation: Y=X+e Use random variable, Y to generate estimate of unknown coefficients S=(XX)-1XY S is a random variable with a distribution S will vary from sample to sample What is the E(βs)? 25 Sampling Properties of Estimated Coefficients Properties of Least Squares Esitmator Does E(S) = e.g., unbiased)? Y=Xβ+e βs -1 E βS = E X 'X X 'Y = True unknown value -1 E X 'X X ' Xβ + e = -1 -1 E X 'X X 'Xβ + X 'X X 'e = -1 E Iβ + X 'X X 'e = -1 E Iβ + E X 'X X 'e = β + X 'X X 'E e = β -1 unbiased estimate =0 26 Sampling Properties of Estimated Coefficients Properties of Least Squares Esitmator What is the covariance matrix, S, of the estimated parameters? Σβ = σ2(X'X)-1 K=2 (K x K) What is a reasonable estimate of 2? σ2 ≡ variance of et = E[(et-E(et))2]=E(et2) with E(et)=0 Up to this point σ2 assumed known eˆ t = y t - yˆ t = y t -β1S -X 2tβ 2S T 2 2 ˆ σ = e S å t t=1 T due to iid assumption 27 Sampling Properties of Estimated Coefficients Is this an unbiased estimator of σ2 Standardize the SSE by the number of parameters in the regression model T å 2 ê t eˆ ¢eˆ = T-K T-K σ 2u = t-1 Given the above: yy yXβs βs Xy y Xβs y Xβs βs XXβs 2 σU TK TK yy yXβs βs Xy yX XX XXβs -1 TK yy yXβs βs Xy yXβs yy βs Xy TK TK 28 Sampling Properties of Estimated Coefficients In contrast to our least squares estimate of β which is linear form of y, the above is a quadratic form of the observable random vector y The above implies that σU2 is a random variable and that our estimate of σ2 will vary from sample to sample We have derived the E(σU2) Lets now evaluate the variance of the random variable σU2 We showed in a previous handout that es′es=e(IT-X(X′X)-1X′)e=e′Me where M is an idempotent matrix. True unknown CRM error error Before we examine the variance of σU2 lets talk about the PDF of e′Me/σ2 29 Sampling Properties of Estimated Coefficients Lets assume that e~N(0,σ2IT) I will show a little later that βl=βs=(X′X)-1X′Y ~N(β,σ2(X′X)-1) where βl is the maximum likelihood estimator of the unknown CRM coefficients assuming normality Given this assumption, lets look at e′Me/σ2 The numerator in the above is a quadratic form involving the normal random vector, e On page 52 of JHGLL, and Section A.19, the distributional characteristics of quadratic forms of normal RV’s are discussed 30 Sampling Properties of Estimated Coefficients The implications of this discussion that with e~N(0,σ2IT) and M idempotent, e′Me/σ2 is distributed χ2 with DF equal to the rank of M where the rank of an idempotent matrix equals its trace tr(M) = tr(IT-X(X′X)-1X′) = tr(IT)-tr[X(X′X)-1X′] = tr(IT)-tr[X′X(X′X)-1] tr(ABC)=tr(CAB) = tr(IT) – tr(IK) =T-K = rank of M 2 T K U U2 2 ~ 2 T K es es 2 eMe 2 ~ T2 K a constant T2 K 31 Sampling Properties of Estimated Coefficients We can use the above result find the variance of σU2 A characteristic of a RV that is distributed χ2 is that its variance is equal to twice its DF T K U2 2 T K var 2 T K 2 4 var 2 T K 2 U 4 2 var U2 T K Note that in order for us to say something about the variance of our error term under the CRM we needed the additional normality assumption 32 Sampling Properties of Estimated Coefficients Given the normality assumption of the error term, →βs=βl ~ N(β,σ2(X′X)-1) I would like to now show that the random vector βs(=βl) is independent of the random variable σU2 (p.29-30 of JHGLL) Since σU2=e′ses/(T-K), βl and σU2 will be independent if es (= el) and βs (=βl) are independent Given the above assumptions, both el and βl are normal random vectors To show that they are independent it is sufficient to show that the matrix containing the covariances between the elements of el and βl are zero This (T x K) covariance matrix can be represented by E[el(βl-β)′] (T x K) 33 Sampling Properties of Estimated Coefficients Previously we showed that es=(IT-X(X′X)-1X′)e [= el] We also know that: βs (=βl) = (X′X)-1X′y = (X′X)-1X′(Xβ+e) =β+(X′X)-1X′e true unknown value →βl- β= (X′X)-1X′e This implies the covariance matrix can E[ee′]=σ2IT shown to be: E el β l β (βl-β)′ el -1 -1 E IT X X X X ee X X X IT X XX X E ee X XX -1 2 -1 IT X XX X X XX 0TxK -1 -1 34 Sampling Properties of Estimated Coefficients The above results show that βl and σU2 are independent For more theoretical treatment refer to section 2.5.9, bottom of page 52 in JHGLL 35 Food Expenditure Model Results Summary FOODEXP = 7.3832 + 0.2323 INC 4.008 (0.055) Std. error K=2 T=40 u2 S S u2 SSE=e'e =1780.4 1780.4 46.853 40 2 XX 1 (X'X)-1 -0.0045548 0.342922 46.853 -0.0045548 6.525442e-005 16.0669 0.2134 0.2134 0.0030 16.0669½ = 4.008 36 Our Classical Regression Model In summary, with K regressors, T observations and our linear model: The random variable Y is composed of nonstochastic conditional mean and an unobservable error term: Y = Xb + e bs= (XX)-1XY bs is a linear function of the observable random vector Y bs is a random vector with a sampling distribution bs is unbiased bs cov. matrix ≡ Sb = s2(XX)-1 →bs ~(b, s2(XX)-1) Finite sample properties, JHGLL: 198-209 This implies that with e~(0T, s2IT) 37 Y~(Xb, s2IT) Our Classical Regression Model βs was obtained w/o knowing the distribution of et Lets compare the above estimate of β with estimates obtained from other linear and unbiased estimators (β*) βs = AY where A=(X'X)-1X' β*=CY where C is a (K x T) matrix that is not a function of Y or the unknown parameters (A is an example of such a matrix) By assumption, E(βS)=E(β*)=β Interested in finding the Best Linear Unbiased Estimator (BLUE) of the true, unknown parameter vector, β 38 Our Classical Regression Model Is bs BLUE (e.g., minimum variance compared to β*)? Gauss-Markov Theorem: Given the CRM assumptions A1-A5, s is BLUE Multiple β’s→ βS is better than any other linear unbiased estimator, β* if: Var(a'βS) Var(a'β*) where a'β is a is Kx1 any linear combination of β’s constant vector Var(a'β*) Var(a'βS) Sβs, Sβ* are (K x K) → a'(Sβ*)a a'(Sβs)a a a'(Sβ*-Sβs)a 0 for βS to be best (1 x 1) To determine the above, I need to know the characteristics of definite matrices 39 Our Classical Regression Model Is bs BLUE (e.g., minimum variance compared to β*)? a'(Sβ*-Sβs)a 0 a for βS to be best (1 x 1) I want to show that if this holds a, (Sβ*-Sβs) is positive semi-definite The above is a characteristic of a pos. semi-definite matrix (JHGLL, p. 960) “A symmetric matrix D is positive semi-definite iff C′DC≥0 C” Let D be Sβ*-Sβs and is symmetric The above shows that βs has the “smallest” variance among all linear unbiased estimators of β →βS is Best Linear Unbiased Estimate of 40 β, the true unknown parameter Our Classical Regression Model How well does the estimated equation explain the variance of the dependent variable? Lets first talk about the variation of the dependant variable unexplained part Y1 ˆ ˆ = Xβ +eˆ Y Y=Y+e s explained YT by model 1x1 YY βs X+eˆ Xβs +eˆ SSE Sum of sq. βs XXβs + eˆ Xβs + βs Xeˆ + eˆ eˆ of yt’s 1x1 1x1 1x1 1x1 βs XXβs + 2βs Xeˆ + eˆ eˆ 41 Our Classical Regression Model Note that ê = I-X XX X Y -1 YY =βs XXβs +2βs Xe +eˆ eˆ βs XXβs + -1 2βs X I-X XX X Y + eˆ eˆ = 0 given that X′[(I – X(X′X)-1X′)]=(X′-X′)=0 ˆ Y ˆ +eˆ eˆ YY =βs XXβs +eˆ eˆ =Y Sum of sq. of yt’s (TSS) Ŷ=XβS Lets use the above but within the framework of deviations from the mean of Y (e.g. our naïve model) 42 Our Classical Regression Model We can represent the total variation about the mean of the data via ˆ + e = Xb + e Y = Y t t st Subtract mean from both sides s st est ˆ - Y )+ (Y - Y ˆ ) Yt - Y = (Y t t t Total Variation Explained by Exog Variables Unexplained Component 43 Our Classical Regression Model Yt = bˆ 0 + bˆ1X t + eˆ t Yt Yt ˆ ê t = Yt - Y t Ŷt = bˆ 0 + bˆ 2X t Y Ŷt - Yt Y Yt - Y = (Yˆ t - Y) + (Yt - Yˆ t ) Xt X 44 Our Classical Regression Model Total Variation About Mean ˆ - Y) + (Y - Y ˆ ) Yt - Y = (Y t t t Total variation explained unexplained If our goal is to have an accurate prediction of Y, we would like the component explained by our exogenous variables, Ŷt -Y, to be large relative to ˆ error component, Yt -Y t A large unexplained/unpredictable component would mean our prediction could be “way off” 45 Our Classical Regression Model Note that: Yt -Y 2 YY -TY 2 T t 1 T T t=1 t=1 given that 2 Yt Y 2Y Yt =2TY 2 ˆ Y ˆ + eˆ eˆ but YY = Y 2 2 ˆ ˆ Y Y - TY = Y Y - TY + eˆ eˆ Total Sum of Squares (TSS) Explained Sum Error Sum of Squares (RSS) of Squares (SSE) TSS: a measure of total variation of Yt about its mean RSS: portion of total variation in Yt about sample mean explained by RHS variables, X SSE: portion of total variation in Yt about mean not explained by RHS 46 Our Classical Regression Model In scalar notation, the above definition of deviation from sample mean can be represented as: T T 2 T ˆ - Y ) + å eˆ 2 å (Yt - Y) = å (Y t t t= 1 Total Sum of Squares (TSS) 2 t= 1 Explained Sum of Squares (RSS) t= 1 Error Sum of Squares (SSE) 47 Our Classical Regression Model How well does the estimated equation explain the variance of the dependent variable? R2 (Coefficient of Determination) is the proportion of total variation about the mean explained by the model 2 ˆ ˆ RSS Y Y TY R2 = TSS YY - TY 2 But because TSS=RSS+SSE SSE eˆ eˆ 2 R 1=1TSS YY - TY 2 Note: The β’s that minimize the SSE → maximize the R2 value 48 Our Classical Regression Model Calculation of R2 (Greene:31-38) Use the above formulas or the following: M0 ≡ [IT – (1/T) ii'] i ≡ column vector of 1’s diagonals of M0 are (1-(1/T)) off-diagonals of M0 are -(1/T) M0 a T x T idempotent matrix that transforms any variable into deviations from sample means RSS TSS=Y'M0Y = βs'X'M0Xβs + es'es 0 β X M Xβs RSS 2 s R = = TSS YM 0Y 49 Our Classical Regression Model 0 < R2 < 1 (when intercept present) = 0→regression is horizontal line, all elements of β are zero except constant term→ predicted value of yt equals μ t = 1→all residuals are 0, perfect fit R2 will never decrease when another variable is added to regression (Greene, pp.34) 50 Our Classical Regression Model 2 Adjusted R2, R , controls for D.F. (e.g., number of regressors in the model): SSE (T - K ) 2 R = 1TSS (T - 1) s 2u = 1TSS (T - 1) (T - 1) = 11- R 2 (T - K) ( ) When K > 1, adjusted R2< R2 51 Our Classical Regression Model Adjusted R2 may decline when a variable is added to the model ↑ or ↓ SSE (T - K ) R = 1TSS (T - 1) 2 Does not change or depends on whether contribution of new variable to model fit (as represented by the SSE) offsets the loss due to correction for DF. When dropping variables from a model, if |t-ratio| < 1.0 Adj. R2 will . When |t-ratio| > 1.0, dropping variable from the regression → Adj. R2 will (Greene, p.35) 52 Our Classical Regression Model Change in R2 from adding a variable Greene:28-31, 34 R2X,z is the coefficient of determination in the regression of y on X and an additional variable, z. R2X is the coefficient of determination when regressing Y on X alone r*yz is the partial correlation between y and z after accounting for the effects of X (see Greene: 28-31 for details concerning the calculation of r*yz ) Change in R2 from adding a variable to a regression is: R 2X 1 - R 2X ryz*2 % variation in Y left after X 53 Food Expenditure Model Results Summary FOODEXP = 7.3832 + 0.2323 INC 4.008 (0.055) Std. error K=2 T=40 TSS=2607.0 SSE=e'e =1780.4 RSS= 826.6 1780.4 826.6 2 46.853 R 0.317 40 2 2607.0 1780.4 40 2 0.299 R 2 1 2607.0 40 1 u2 -0.004554 0.342922 XX = 46.853 -0.0045548 6.525442e-005 16.0669 0.2134 0.2134 0.0030 S S u2 1 54 Prediction Under The Classical Model 55 Prediction Under The Classical Model Assume we have the CRM which satisfies assumptions A.1-A.5 βs=(X'X)-1X'Y βS is BLUE of β Attempt to anticipate new/unknown values of Y0 given known explanatory variables, X0 (JHGLL: 209-211) Remember our assumption of the error variance: E(ee′)=σ2IT This assumption simplifies prediction After our GLS lectures we will revisit this Prediction variance: V(e0|X,X0) = σ2IT0+σ2X0(X'X)-1X'0 56 Prediction Under The Classical Model With 2 parameters (1 an intercept) 2 x 02 -x 2 1 2 var(eˆ0 ) 1 T 2 T x -x t2 2 t 1 2nd and 3rd terms → progressively smaller as we collect more information 1st term is constant → no matter how much data one has, one can never predict with certainty The farther the forecast point is from the center of data, the greater the degree of uncertainty 57 Prediction Under The Classical Model V(e0|X,X0) = σ2IT0+σ2X0(X'X)-1X'0 Prediction variability due to: Equation error term σ2IT0 Variability in estimating unknown parameters σ2X0(X'X)-1X'0=X0 Var(βs)X'0 58 Prediction Under The Classical Model Lets assume that e~N(0,σ2IT) → The following distribution of the prediction error Ŷ0 -Y0 V (e0 |X,X 0 ) ~ t T-K With 2 param. 2 x 02 -x 2 1 2 V (e0 ) 1 T 2 T x -x t2 2 t 1 This implies the following forecast interval 59
© Copyright 2024