The problem set again uses the Stata dataset WAGE2.dtA. The dataset
contains the information on monthly earnings, employment history,
education, demographic characteristics, and two test scores for 935 men in
year 1980:
wage monthly earnings (in 1976 USD)
hours average weekly hours of work
IQ IQ (intelligence quotient) score
educ years of education
age age in years
married =1 if the person is married; 0 otherwise
black =1 if the person is black; 0 otherwise
Problem 1: R2 and hypothesis testing in simple regression (20 points
total)
- (5 points) Estimate the following linear regression model and paste your
results below:
lwage= ?0 + ?1 educ + u
Hint: You did this for PS2.
What is the R2 of the regression and how do you interpret it? What does it tell
you about the extent to which wages are causally affected by education? - (5 points) Using the information on SST, SSE, and SSR in the Stata output,
show how the R2 for this regression was calculated.
Hint: R2 = SSE/SST, so plug the figures for SSE and SST into this formula and
solve. - (10 points) Use the output from your regression to test the hypothesis that
education is unrelated to log-wage. Use the steps shown in the cookbook that we reviewed in class. (You can find this in the
“Cookbooks & crib sheets” folder on D2L.) Start by stating the null and the
alternative hypotheses in terms of the notation used in class. Use a
significance level of 1%. What do you conclude and why? Be sure to explain
your reasoning.
Problem 2: Ability bias in the estimated return to education (30
points total)
Suppose that the population model for log-earnings (lwage) is given by:
lwage = ?0 + ?1 educ + ?2 ability + v
A person’s ability is not observed, so you estimate the simple linear
regression model:
lwage = ?0 + ?1 educ + u
where ability is included in the error term u. - (5 points) What is the OLS estimate of the slope coefficient on education
from the simple linear regression and how do you interpret it?
Hint: You did this in Problem Set 2, and you estimated this regression above
in Problem 1. The dependent variable is in log terms, so interpret the slope
coefficient as a percentage change. - (5 points) What is the key condition for the OLS estimator of ?1 you
obtained in question 1 to be an unbiased estimator of the effect of education
on the wage? Does this condition hold? Why or why not?
Hint: Think of ability as an omitted variable in the lwage model. (Be sure to
read the section on “Omitted Variable Bias: The Simple Case” in section 3.3
of Wooldridge.) - (5 points) What is the direction of the bias of the OLS estimator of ?1 in the
simple linear regression model? Explain your answer using the 2x2 OVB
matrix in Wooldridge and the slides.
Hint: Think about two things. First, how are lwage and and ability related in
the population wage model — that is, what is the likely sign of the estimated
slope coefficient on ability)? Second, how are the variables ability and educ
related — that is, do workers with more ability tend to have more or less
education than workers with less ability, on average? - (5 points) Estimate a model of lwage on educ and IQ. (In Stata, type: reg
lwage educ IQ.) What is the OLS estimate of ?1 from this regression and how
do you interpret it? Why is it smaller than the estimate you obtained in
question 1? Explain. - (5 points) Now estimate the auxiliary regression of IQ on educ. Interpret
the coefficient on educ in this regression. Is the relationship you estimate
between IQ and educ consistent with your answer to question 3 above? - (5 points) Show that the bias you discussed in question 4 can be written in
terms of the OVB formula.
Hint: First, write down the OVB formula:
An easy way to write this without all the Greek letters and superscripts is:
beta1-tilde = beta1-hat + (beta2-hat)(delta1-tilde).Now plug in the numbers
for each term and show that the OVB formula is correct. Remember that
beta1-tilde comes from the regression of lwage on just educ; that beta1-hat
and beta2-hat come from the regression of lwage on both educ and IQ; and
that delta1-tilde comes from the regression of IQ on educ.
Sample Solution