STAC67H3: Regression Analysis Fall, 2014 Instructor: Jabed Tomal Department of Computer and Mathematical Sciences University of Toronto Scarborough Toronto, ON Canada November 11, 2014 Jabed Tomal (U of T) Regression Analysis November 11, 2014 1 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models Uses: Polynomial regression models have two basic types of uses: 1 When the true curvilinear response function is indeed a polynomial function. 2 When the true curvilinear response function is unknown (or complex) but a polynomial function is a good approximation to the true function. [More Common] Jabed Tomal (U of T) Regression Analysis November 11, 2014 2 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models Danger of polynomial regression models: 1 Polynomial regression models may provide good fits for the data at hand, but may turn in unexpected directions when extrapolated beyond the range of the data. Jabed Tomal (U of T) Regression Analysis November 11, 2014 3 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Second Order: 1 Polynomial regression models may contain one, two, or more than two predictor variables, and each predictor variable may be present in various powers. 2 A polynomial regression model with one predictor variable raised to the first and second powers: Yi = β0 + β1 xi + β2 xi2 + i ¯. where, xi = Xi − X Jabed Tomal (U of T) Regression Analysis November 11, 2014 4 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Second Order: 1 This polynomial regression model is called a second-order model with one predictor variable because the single predictor variable is expressed in the model to the first and second powers. 2 The predictor variable is centered – expressed as a deviation ¯ – and that the ith centered observation is around its mean X denoted by xi . 3 The reason for centering is to reduce high correlation between X and X 2 . Jabed Tomal (U of T) Regression Analysis November 11, 2014 5 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Second Order: 1 The regression coefficients in polynomial regression are frequently written in a slightly different fashion, to reflect the pattern of the exponents: Yi = β0 + β1 xi + β11 xi2 + i 2 The response function for the regression model is: E{Y } = β0 + β1 x + β11 x 2 This response function is a parabola and is frequently called quadratic response function. Jabed Tomal (U of T) Regression Analysis November 11, 2014 6 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Second Order: 1 The regression coefficient β0 represents the mean response of Y ¯. when x = 0, i.e., when X = X 2 The regression coefficient β1 is often called the linear effect coefficient, and β11 is called the quadratic effect coefficient. Jabed Tomal (U of T) Regression Analysis November 11, 2014 7 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Third Order: 1 The regression model: Yi = β0 + β1 xi + β11 xi2 + β111 xi3 + i ¯ is a third-order model with one predictor where xi = Xi − X variable. 2 The response function for the regression model is: E{Y } = β0 + β1 x + β11 x 2 + β111 x 3 . Jabed Tomal (U of T) Regression Analysis November 11, 2014 8 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models One Predictor Variable - Higher Order: 1 Polynomial models with the predictor variable present in higher powers than the third should be employed with special caution. 2 The interpretation of the coefficients becomes difficult for such models, and the models may be highly erratic for interpolations and even small extrapolations. Jabed Tomal (U of T) Regression Analysis November 11, 2014 9 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models Two Predictor Variables - Second Order: 1 The regression model: 2 2 Yi = β0 + β1 xi1 + β2 xi2 + β11 xi1 + β22 xi2 + β12 xi1 xi2 + i ¯1 , xi2 = Xi2 − X ¯2 is a second-order model with where xi1 = Xi1 − X two predictor variables. 2 The response function is: E{Y } = β0 + β1 x1 + β2 x2 + β11 x12 + β22 x22 + β12 x1 x2 which contains separate linear and quadratic components for each of the two predictor variables and a cross-product term. 3 The latter term represents the interaction effect between x1 and x2 , and β12 is often called the interaction effect coefficient. Jabed Tomal (U of T) Regression Analysis November 11, 2014 10 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models Three Predictor Variables - Second Order: 1 The regression model: 2 + β x2 + β x2 Yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β11 xi1 22 i2 33 i3 +β12 xi1 xi2 + β13 xi1 xi3 + β23 xi2 xi3 + i ¯1 , xi2 = Xi2 − X ¯2 and xi3 = Xi3 − X ¯3 is a where xi1 = Xi1 − X second-order model with three predictor variables. Jabed Tomal (U of T) Regression Analysis November 11, 2014 11 / 52 Models for Quantitative and Qualitative Predictors Polynomial Regression Models Three Predictor Variables - Second Order: 1 The response function is: E{Y } = β0 + β1 x1 + β2 x2 + β3 x3 + β11 x12 + β22 x22 + β33 x32 +β12 x1 x2 + β13 x1 x3 + β23 x2 x3 2 The coefficients β12 , β13 , and β23 are interaction effect coefficients for interactions between pairs of predictor variables. Jabed Tomal (U of T) Regression Analysis November 11, 2014 12 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Fitting of Polynomial Models: 1 Fitting of polynomial regression models presents no new problems as they are special cases of the general linear regression model. 2 Hence, all earlier results on fitting apply, as do the earlier results on making inferences. Jabed Tomal (U of T) Regression Analysis November 11, 2014 13 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Hierarchical Approach to Fitting: 1 First fit a second-order or third-order model and then explore whether a lower-order model is adequate. 2 For instance, with one predictor variable, the model: Yi = β0 + β1 xi + β11 xi2 + β111 xi3 + i may be fitted with the hope that the cubic term and perhaps even the quadratic terms can be dropped. Jabed Tomal (U of T) Regression Analysis November 11, 2014 14 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Hierarchical Approach to Fitting: 1 One would wish to test whether or not β111 = 0 or whether or not both β11 = 0 and β111 = 0. 2 To test whether or not β111 = 0, the appropriate extra sum of squares is SSR(x 3 |x, x 2 ). 3 To test whether or not β11 = β111 = 0, the appropriate extra sum of squares is SSR(x 2 , x 3 |x) = SSR(x 2 |x) + SSR(x 3 |x, x 2 ). Jabed Tomal (U of T) Regression Analysis November 11, 2014 15 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Hierarchical Approach to Fitting: 1 With the hierarchical approach, if a polynomial term of a given order is retained, the all related terms of lower order are also retained in the model. 2 Thus, one would not drop the quadratic term of a predictor variable but retain the cubic term in the model. 3 Similarly, an interaction term (second power) would not be retained without retaining the terms for the predictor variables to the first power. Jabed Tomal (U of T) Regression Analysis November 11, 2014 16 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Hierarchical Approach to Fitting: 1 With the hierarchical approach, if a polynomial term of a given order is retained, the all related terms of lower order are also retained in the model. 2 Thus, one would not drop the quadratic term of a predictor variable but retain the cubic term in the model. 3 Similarly, an interaction term (second power) would not be retained without retaining the terms for the predictor variables to the first power. Jabed Tomal (U of T) Regression Analysis November 11, 2014 17 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Regression Function in Terms of X : 1 After a polynomial regression model has been developed, we often wish to express the final model in terms of the original variables rather than keeping it in terms of the centered variables. 2 The fitted second-order model for one predictor variable that is ¯: expressed in terms of centered values x = X − X ˆ = b0 + b1 x + b11 x 2 Y becomes in terms of the original X variable: Jabed Tomal (U of T) Regression Analysis November 11, 2014 18 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Regression Function in Terms of X : 1 ˆ = b0 + b0 X + b0 X 2 Y 0 1 11 where ¯ + b11 X ¯2 b00 = b0 − b1 X ¯ b10 = b1 − 2b11 X 0 = b11 b11 Jabed Tomal (U of T) Regression Analysis November 11, 2014 19 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Exercise 6.5. Brand Preference. In a small-scale experimental study of the relation between degree of brand liking (Y ) and moisture content (X1 ) and sweetness (X2 ) of the product, the following results were obtained from the experiment based on a completely randomized design (data are coded): i: Xi1 : Xi2 : Yi : Jabed Tomal (U of T) 1 4 2 64 2 4 4 73 3 4 2 61 ··· ··· ··· ··· Regression Analysis 14 10 4 95 15 10 2 94 16 10 4 100 November 11, 2014 20 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models 1 We want to fit the following polynomial regression model 2 Yi = β0 + β1 xi1 + β2 xi2 + β11 xi1 + β12 xi1 xi2 + i ¯1 and xi2 = Xi2 − X ¯2 . where xi1 = Xi1 − X 2 For which the response function is E{Y } = β0 + β1 x1 + β2 x2 + β11 x12 + β12 x1 x2 Jabed Tomal (U of T) Regression Analysis November 11, 2014 21 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models 1 The fitted following polynomial regression model is ˆ = 82.219 + 4.425x1 + 4.375x2 − 0.0938x 2 − 0.500xi1 xi2 Y 1 Jabed Tomal (U of T) Regression Analysis November 11, 2014 22 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Coefficient of Multiple Determination: 1 The coefficient of determination R 2 = 0.9634 The fitted model can explain approximately, 96.35% variability of the response variable. 2 The adjusted R 2 is Ra2 = 0.9501 The coefficient is smaller than the unadjusted coefficient because of the relatively large number of parameters in the polynomial regression function with two predictor variables. Jabed Tomal (U of T) Regression Analysis November 11, 2014 23 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Partial F Test: 1 We now turn to consider whether a first-order model would be sufficient. The hypotheses are: H0 : β11 = β12 = 0 2 versus HA : not all β in H0 equalzero The partial F test statistic is F∗ = SSR(x1 x2 , x12 |x1 , x2 )/2 (2.25 + 20.00)/2 = = 1.698473 MSE 6.55 Jabed Tomal (U of T) Regression Analysis November 11, 2014 24 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models Partial F Test: 1 For level of significance α = 0.05, we require F (0.95; 2, 11) = 3.982298 2 3 Since F ∗ = 1.698473 < 3.982298, we conclude H0 , that no curvature on x1 and interaction effects between x1 and x2 are needed. A first-order regression model might be appropriate. Jabed Tomal (U of T) Regression Analysis November 11, 2014 25 / 52 Models for Quantitative and Qualitative Predictors Implementation of Polynomial Regression Models First-Order Model: 1 One the basis of our analysis, we decided to consider the first-order model as following: Yi = β0 + β1 xi1 + β2 xi2 + i 2 The fitted model is ˆ = 81.750 + 4.425x1 + 4.375x2 Y 3 Do check the residuals, and make inferences of the regression parameters if the model is appropriate. Jabed Tomal (U of T) Regression Analysis November 11, 2014 26 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models 1 We have previously noted that regression models with cross-product interaction effects are special cases of general linear regression model. 2 The following regression function E{Y } = β0 + β1 X1 + β2 X2 + β3 X1 X2 contains a cross-production term between X1 and X2 , such as β3 X1 X2 . 3 The cross-product term is called an interaction term. Jabed Tomal (U of T) Regression Analysis November 11, 2014 27 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models 1 When there are three predictor variables whose effects on the response variable are linear, but the effects on Y of X1 and X2 and of X1 and X3 are interacting, the response function would be modeled as follows using cross-product terms: E{Y } = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X1 X2 + β5 X1 X3 Jabed Tomal (U of T) Regression Analysis November 11, 2014 28 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Interpretation of Regression Coefficients: 1 The regression model for two quantitative predictor variables with linear effects on Y and interacting effects of X1 and X2 on Y represented by a cross-product term is as follows: E{Y } = β0 + β1 X1 + β2 X2 + β3 X1 X2 2 The meaning of the regression coefficients β1 and β2 here is not the same as that given earlier because of the interaction term β3 X1 X2 . 3 The regression coefficients β1 and β2 no longer indicate the change in the mean response with a unit increase of the predictor variable, with the other predictor variable held constant at any given level. Jabed Tomal (U of T) Regression Analysis November 11, 2014 29 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Interpretation of Regression Coefficients: 1 The change in the mean response with a unit increase in X1 when X2 is held constant is: β1 + β3 X2 2 Similarly, the change in the mean response with a unit increase in X2 when X1 is held constant is: β2 + β3 X1 3 Hence, both the effect of X1 for given level of X2 and the effect of X2 for given level of X1 depend on the level of the other predictor variable. Jabed Tomal (U of T) Regression Analysis November 11, 2014 30 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Implementation of Interaction Regression Models: 1 Two considerations need to be kept in mind when developing regression models with interaction effects. 2 When interaction terms are added to a regression model, high multicollinearities may exist between some of the predictor variables and some of the interaction tersm, as well as among some of the interaction terms. A partial remedy to improve computational accuracy is to center the predictor variables, i.e., to use ¯k . xik = Xik − X Jabed Tomal (U of T) Regression Analysis November 11, 2014 31 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Implementation of Interaction Regression Models: 1 When the number of predictor variables in the regression model is large, the potential number of interaction terms can become very large. For example, if eight predictor variables are present in the regression model in linear terms, there are potentially 28 pairwise interaction terms that could be added to the regression model. It is therefore desirable to identify in advance, whenever possible, those interactions that are most likely to influence the response variable in important ways. Jabed Tomal (U of T) Regression Analysis November 11, 2014 32 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Exercise 6.5. Brand Preference. In a small-scale experimental study of the relation between degree of brand liking (Y ) and moisture content (X1 ) and sweetness (X2 ) of the product, the following results were obtained from the experiment based on a completely randomized design (data are coded): i: Xi1 : Xi2 : Yi : Jabed Tomal (U of T) 1 4 2 64 2 4 4 73 3 4 2 61 ··· ··· ··· ··· Regression Analysis 14 10 4 95 15 10 2 94 16 10 4 100 November 11, 2014 33 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Example: Brand Preference. We want to fit the following regression model with interaction term Yi = β0 + β1 xi1 + β2 xi2 + β12 xi1 xi2 + i ¯1 and xi2 = Xi2 − X ¯2 . where xi1 = Xi1 − X Jabed Tomal (U of T) Regression Analysis November 11, 2014 34 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Example: Brand Preference. The fitted regression model with interaction term is ˆ = 81.750 + 4.425x1 + 4.375x2 − 0.500x1 x2 Y Jabed Tomal (U of T) Regression Analysis November 11, 2014 35 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Partial F Test: 1 We now turn to test whether the interaction effect is significant or not. The hypotheses are: H0 : β12 = 0 2 versus HA : β12 6= 0 The partial F test statistic is F∗ = Jabed Tomal (U of T) SSR(x1 x2 |x1 , x2 )/1 20.00 = = 3.231018 MSE 6.19 Regression Analysis November 11, 2014 36 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models Partial F Test: 1 For level of significance α = 0.05, we require F (0.95; 1, 12) = 4.747225 2 Since F ∗ = 3.231018 < 4.747225, we conclude H0 , that is no interaction effect between x1 and x2 is needed. Jabed Tomal (U of T) Regression Analysis November 11, 2014 37 / 52 Models for Quantitative and Qualitative Predictors Interaction Regression Models First-Order Model: 1 One the basis of our analysis, we decided to consider the first-order model as following: Yi = β0 + β1 xi1 + β2 xi2 + i 2 The fitted model is ˆ = 81.750 + 4.425x1 + 4.375x2 Y 3 Do check the residuals, and make inferences of the regression parameters if the model is appropriate. Jabed Tomal (U of T) Regression Analysis November 11, 2014 38 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors 1 Many variables of interest in business, economics, and the social and biological sciences are qualitative. 2 Examples of qualitative predictor variables are gender (male, female), purchase status (purchase, no purchase), and disability status (not disabled, partly disabled, fully disabled). 3 In order that such a qualitative variable can be used in a regression model, quantitative indicators for the classes of the qualitative variable must be employed. Jabed Tomal (U of T) Regression Analysis November 11, 2014 39 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors 1 In a study of innovation in the insurance industry, an economist wished to relate: 2 The speed with which a particular innovation is adopted (Y ): number of months elapsed between the time the first firm adopted the innovation and the time the given firm adopted the innovation. 3 The size of the insurance firm (X1 ): measured by the amount of total assets of the firm. 4 Type of firm (X2 ): composed of two classes - stock companies and mutual companies. Jabed Tomal (U of T) Regression Analysis November 11, 2014 40 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors 1 The qualitative variable (X2 ) is an indicator variable defined as 1 if stock company X2 = 0 if mutual company 2 The model with the qualitative predictor variable is Yi = β0 + β1 Xi1 + β2 Xi2 + i Jabed Tomal (U of T) Regression Analysis November 11, 2014 41 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors 1 A qualitative variable with c classes will be represented by c − 1 indicator variables, each taking on the values 0 and 1. 2 The qualitative predictor variable disability status can be represented by two indicator variables: 1 fully disabled Z1 = 0 Otherwise 1 partly disabled Z2 = 0 Otherwise Here, not disabled is the reference category. Jabed Tomal (U of T) Regression Analysis November 11, 2014 42 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Interpretation of Regression Coefficients: 1 For the insurance innovation example, the model is Yi = β0 + β1 Xi1 + β2 Xi2 + i where: Xi1 = size of firm 1 if stock company X2 = 0 if mutual company 2 The response function of this regression model is E(Y ) = β0 + β1 X1 + β2 X2 Jabed Tomal (U of T) Regression Analysis November 11, 2014 43 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Interpretation of Regression Coefficients: 1 For mutual firm, X2 = 0 and the response function becomes: E(Y ) = β0 + β1 X1 which is a straight line with intercept β0 and slope β1 . 2 For stock firm, X2 = 1 and the response function becomes: E(Y ) = (β0 + β1 ) + β1 X1 is also a straight line with the same slope β1 and intercept β0 + β2 . Jabed Tomal (U of T) Regression Analysis November 11, 2014 44 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Interpretation of Regression Coefficients: 1 The mean time alapsed before the innovation is adopted, E{Y }, is a linear function of size of firm (X1 ), with the same slope β1 for both types of firms. 2 β2 indicates how much higher (lower) the response function for stock firms is than the one for mutual firms, for any given size of firm. 3 In general, β2 shows how much higher (lower) the mean response line is for the class coded 1 than the line for the class coded 0, for any given level of X1 . Jabed Tomal (U of T) Regression Analysis November 11, 2014 45 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Example: 1 In the insurance innovation example, the economist studied 10 mutual firms and 10 stock firms. 2 The fitted regression model is: ˆ = 33.87407 − 0.10174X1 + 8.05547X2 . Y Jabed Tomal (U of T) Regression Analysis November 11, 2014 46 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Example: 1 The economist was most interested in the effect of type of firm (X2 ) on the elapsed time for the innovation to be adopted and wished to obtain a 95% confidence interval for β2 . 2 The estimates of β2 and σ{b2 } are b2 = 8.055469 and s{b2 } = 1.459106, respectively. We require t(0.975; 17) = 2.110 and obtain the 95% confidence interval of β2 as following b2 ± t(0.975; 17) × s{b2 } = (4.98, 11.13). Jabed Tomal (U of T) Regression Analysis November 11, 2014 47 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Example: 1 With 95% confidence interval, we conclude that stock companies tend to adopt the innovation somewhere between 5 and 11 months later, on the average, than mutual companies, for any given size of firm. 2 A formal test of H0 : β2 = 0 ; HA : β2 6= 0 with level of significance 0.05 would lead to HA , that type of firm has an effect, since the 95% confidence interval for β2 does not include zero. Jabed Tomal (U of T) Regression Analysis November 11, 2014 48 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Qualitative Predictor with More than Two Classes: 1 Consider the regression of tool wear (Y ) on tool speed (X1 ) and tool model, where the latter is a qualitative variable with four classes (M1, M2, M3, M4). We therefore require three indicator variables: 1 if tool model M1 X2 = 0 Otherwise 1 if tool model M2 X3 = 0 Otherwise 1 if tool model M3 X4 = 0 Otherwise Jabed Tomal (U of T) Regression Analysis November 11, 2014 49 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Qualitative Predictor with More than Two Classes: 1 A first-order regression model is Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + i 2 The response function for the regression model is E{Y } = β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 Jabed Tomal (U of T) Regression Analysis November 11, 2014 50 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Qualitative Predictor with More than Two Classes: 1 The response function for tool models M4 for which X2 = 0, X3 = 0 and X4 = 0: E{Y } = β0 + β1 X1 2 For tool models M1 for which X2 = 1, X3 = 0 and X4 = 0: E{Y } = (β0 + β2 ) + β1 X1 3 For tool models M2 for which X2 = 0, X3 = 1 and X4 = 0: E{Y } = (β0 + β3 ) + β1 X1 4 For tool models M3 for which X2 = 0, X3 = 0 and X4 = 1: E{Y } = (β0 + β4 ) + β1 X1 Jabed Tomal (U of T) Regression Analysis November 11, 2014 51 / 52 Models for Quantitative and Qualitative Predictors Qualitative Predictors Qualitative Predictor with More than Two Classes: 1 Thus, response function implies that the regression model of tool wear on tool speed is linear, with the same slope for all four tool models. 2 The coefficients β2 , β3 and β4 indicate, respectively, how much higher (lower) the response functions for tool models M1, M2, and M3 are than the one for tool models M4, for any given level of tool speed. Jabed Tomal (U of T) Regression Analysis November 11, 2014 52 / 52
© Copyright 2024