Introductory Econometrics Problem set 1 Jan Zouhar Department of Econometrics, University of Economics, Prague, [email protected] Due date: March 26 Problem 1.1. In an attempt to prove that a tall person earns more than a short one ceteris paribus, your friend Peter ran a simple regression of wage.usd (hourly wage, in U.S. dollars) on height.cm (height of a person in centimetres), using data on a random sample of 300 employees. The estimated equation is 3 wage.usd D 7 C 0:02 height.cm: (1) Next, Peter thought it might be a good idea to estimate the log-level model, yielding him the equation 6 log.wage.usd/ D 1:9 C 0:0025 height.cm: a) b) c) d) (2) Give an example of a descriptive interpretation of the estimated slope coefficients in (1) and (2). Give an example of a causal interpretation of the estimated slope coefficients in (2). What is the interpretation of the intercept in (1)? Do you think the estimated slope quantifies a causal relationship? Can you give an example of an y z ! x relationship that might spoil the causal interpretation? Problem 1.2. Apart from wage.usd and height.cm, Peter’s dataset (from the previous problem) also contains data on respondents’ wage and height in euros and metres, respectively, stored in variables wage.eur and height.m, and for each observation it holds that wage.eur D 0:8 wage.usd; height.m D 0:01 height.cm: Write down the sample regression function (= estimated equation) in the regression of . . . a ) wage.eur on height.cm. b ) wage.eur on height.m. c ) log(wage.eur) on height.cm. d ) log(wage.eur) on height.m. Problem 1.3. Using the data in attend.gdt, you are supposed to study the relationship between class attendance at a university, and the resulting test score. First, read the description of the dataset (Data ! Dataset info) and the individual variables (Descriptive label in the main window). (If needed, browse the web for additional information about ACT scores and GPAs.) a ) Regress final on attend. Report the estimated equation and create a scatter plot with the actual and fitted values (i.e., the actual data points and the regression line). Looking at the plot, do you think that the homoskedasticity assumption holds in this model? Or does the variance of the final score vary systematically with attendance rate? Explain. b ) Regress final on attend and priGPA. Report the estimated equation again. Explain the difference in the estimated slope coefficient of the attend variable (compared with your previous equation). Which of the results do you consider a more accurate estimate of the causal effect of class attendance on the final score? c ) Regress final on attend, priGPA, and ACT. Again, explain the differences between the attend coefficients. d ) Interpret the R-squared in your last equation. Does its low value render the coefficient estimates unreliable? 1
© Copyright 2024