Step-by-step Instant Solution
Could you assist me with question 4?
this question deals with multivariate regression, not bivariate regression.
This question requires stata. In addition, a log of the stata commands are required as well as answers and output.
THE UNIVERSITY OF SYDNEY
FACULTY OF ARTS AND SOCIAL SCIENCES
SCHOOL OF ECONOMICS
ECMT1020: Introduction to Econometrics
Due date: 27 May 2016, by 4pm
Deadline: 3 June 2016, by 4pm
? This assignment consists of four questions. The ?rst three questions are worth 20 marks each and
the last question is worth 40 marks, so the entire assignment is worth 100 marks. Partial credit
may be given for each sub-question. Your mark for this assignment determines 5% of your ?nal
grade for this course.
? This paper consists of one front page and three pages with questions. There are four pages in total.
? Use Stata, and no other software, to perform the calculations for question 4. The data set that you
need can be found on Blackboard. In addition to your answers to the questions, also include the
relevant Stata commands and output, for example by copying and pasting.
? Assignments must be submitted in hard copy (printed, legibly handwritten, or a combination of
both) via the drop boxes in the School of Economics foyer, which is located on the second ?oor
of the Merewether Building (H04). All submissions must include a completed, signed and dated
?Individual Assessment Cover Sheet?, which can also be found on Blackboard.
? Assignments not submitted on or before the due date stated above are subject to penalty; refer
to sydney.edu.au/arts/current students/late work.shtml. That is, two marks will be subtracted for
each working day or part thereof that has passed after the due date. Concretely, submissions
received after 4pm on 27 May but before 4pm on 30 May will be subject to a two-mark penalty,
submissions received between 4pm on 30 May and 4pm on 31 May incur a four-mark penalty, et
cetera. After the deadline, assessments cannot be accepted and a mark of 0 will be awarded.
Question 1 (20 marks). In this question, I?m after intuitive answers rather than mathematical ones.
Indeed, some of the underlying mathematical arguments are far beyond the scope of this course.
(a) (4 marks) Without even thinking about the central limit theorem or any other asymptotic arguments, explain why a data set with n = 30 observations is nowhere near large enough to
estimate a model with k = 40 parameters using ordinary least squares regression.
(b) (4 marks) To estimate the partial effect of x2 on y while x3 is kept constant, we need to estimate a
regression model like y = ?1 +?2 x2 +?3 x3 +u. Some students ?nd this counterintuitive:
if we?re keeping x3 constant, why should it be in our model at all? Explain why this is
the right thing to do.
(c) (4 marks) Consider again the model with two explanatory variables from part (b). There are two
situations in which the partial effect of x2 on y is equal to the total effect of x2 on y;
what are these situations?
(d) (4 marks) Why should none of R2 , R2 , information criteria, and F tests be used to compare models
with yt on the left hand side to models with ln yt instead?
(e) (4 marks) Recall that the coef?cient estimators b2 and b3 are random variables. Why does it make
sense that these two random variables are usually correlated with each other?
Question 2 (20 marks). We have often used the result that TSS = ExpSS + RSS, for example to justify
the use of R2 and of F tests. The purpose of this question is to prove that result.
(a) (6 marks) Prove that (yi ? y )2 = (yi ? yi )2 + (?i ? y )2 + 2 (yi ? yi ) (?i ? y ). (Hint: for this part,
the de?nitions of yi and y are completely irrelevant; you may ?nd it easier to just call
them a and b or something similar.)
(b) (6 marks) Use the result of part (a) to establish that TSS = ExpSS + RSS + 2 (n ? 1) Cov [e, y ].
(c) (6 marks) Prove that Cov [e, y ] = 0. You may take it as given that the residual is uncorrelated with
each of the regressors.
(d) (2 marks) Complete the proof that TSS = ExpSS + RSS.
Question 3 (20 marks). We have collected data on the annual number of cars of twenty different brands
sold in Australia (sales, in number of cars), as well as each brand?s average retail price (price, in
dollars), their annual marketing expenditure (mark, also in dollars), and whether or not they assemble
some of their cars domestically (domestic, dummy variable). We wish to investigate how all of these
factors in?uence sales, and we settle on the following regression model:
. regress lnsales lnprice lnmark domestic
---------+---------------------------Model | 15.2271807
Residual | 4.73289348
---------+---------------------------Total | 19.9600742
Number of obs
Prob > F
Coef. Std. Err.
[95% Conf. Interval]
---------+----------------------------------------------------------lnprice | -1.389525 .2392136
lnmark | .1775161
domestic | .6156965
_cons | 21.35649
--------------------------------------------------------------------(a) (8 marks) I have removed four numbers from this table, indicated by ?XXXX?. Compute them.
(b) (4 marks) Describe what the coef?cient estimate ?1.389525 means, in economic terms.
(c) (3 marks) Suppose we wish to test the claim that the Australian car market is completely pricedriven, so that marketing and whether production is done domestically are irrelevant.
Regressing lnsales only on lnprice gave an RSS of 10.69, whereas regressing
lnsales only on lnmark and domestic gave an RSS of 14.72. Which of these two
numbers is useful for testing our claim, and why?
(d) (5 marks) Test the claim described in part (c).
Question 4 (40 marks). The data set education.dta contains data on the years of education of a
random sample of 718 Americans, as well as the same information for both of their parents. It is likely
that parents? achievements have some predictive power for their children?s outcomes, as a result of both
a hereditary component of intelligence and the possibility that higher educated parents stimulate their
children more to do well at school. Thus, we consider the model educi = ?1 +?2 meduci +?3 feduci +ui .
(a) (5 marks) We will ignore any heteroskedasticity and autocorrelation problems in the remainder of
this question. However, discuss whether it is likely that these problems are present in
(b) (5 marks) Estimate this model, and provide interpretations for the three estimated coef?cients.
(c) (5 marks) Give 95% con?dence intervals for both the conditional mean and the actual value of
education for people whose mother has 16 years of education, while the father has 12.
(d) (5 marks) Explain what the restriction ?2 = ?3 means, in economic terms.
(e) (5 marks) Test the restriction in part (d), and show that it cannot be rejected. (Note: I want to see
that you have estimated both the restricted and the unrestricted model. Feel free to use
Stata?s test command to check your result, but using only that would be too easy.)
(f) (5 marks) Use the restricted model from part (e) to repeat the prediction exercise in part (c). Intuitively, why are the resulting con?dence intervals narrower this time?
(g) (5 marks) Go back to the original model, where ?2 and ?3 are allowed to be different. Now, what
would the restriction ?2 + ?3 = 1 mean, in economic terms?
(h) (5 marks) Test the restriction in part (g). (The same note as in part (e) applies.)
This is the last page of the assignment.
Paper#9256807 | Written in 27-Jul-2016Price : $17.85