ECO-5006A: Introductory Econometrics

24/7 Homework Help

Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!

Autumn 2020
Stata Project
Please read the Project Assignment Brief and the instructions below very carefully before attempting any of the questions. The Assignment Brief is available on the ‘Project’
section on the module’s Blackboard and provides some general information and further
instructions.
• All the statistical analysis needs to be done using the Econometrics software package, Stata.
• The data set PROJECT 2020.dta contains information on 28,424 university graduates in fulltime employment, based on DLHE Survey of 2016/17. Please refer to the Assignment Brief for
more information about this data set.
• The questions of the Project are based on the following main topic:
“Does studying Economics pay off, relative to other subjects in Social Sciences?
Evidence using data of recent UK graduates”
• In particular, is there evidence that Economics graduates ‘do better/worse’ in the graduate
market, relative to graduates who studied the other subjects available in the data? And how
much better/worse do they do? By ‘doing better/worse’, we mean whether:
– they earn more/less based on self-reported salaries of graduates (before any deductions)
– they are more/less likely to secure a managerial / professional position after graduation
Thus, in your analysis, you need to use both outcome variables ‘salary’ and ‘professional’.
• In questions that require you to use Stata commands to get your answer, make sure you clearly
show these Stata commands within your answers.
• Presentation of your answers matters. Thus, please, all graphs, equations, results and discussions need to be well-presented.
• For the main text, you need to use ‘calibri’ font of size 11, and allow 1.15 line spacing. For
text within tables, you can use smaller font, up to size 9. You are also allowed to change the
size of your graphs, as long as the graphs are still clear to read (i.e. clear legends, titles, etc.)
• Please make sure your answer to each part of the Project (i.e. part (a), part (b), etc.) starts
on a new page.
1
QUESTIONS
(a) [12 Marks]
Investigate the main question of the Project by using descriptive statistics only; e.g. by appropriate use of means, medians, variances, graphs, etc. Don’t forget that you need to investigate
this in terms of both outcome variables ‘salary’ and ‘professional’. There is no word limit for
this question, but your answer needs to be presented within two A4 sides (so, all tables, graphs
and discussions need to be presented within two A4 sides, i.e. one full page).
(b) [50 Marks]
In this part you need to investigate the main question of the Project by using regression
analysis (i.e. appropriate MLR models). Don’t forget that you need to investigate this in
terms of both outcome variables, ‘salary’ and ‘professional’. That is, you will need two separate
MLR models, one using ‘salary’ as the dependent variable, and one using ‘professional’ as the
dependent variable.
So, in this part, you need to investigate whether Economics graduates are expected to ‘earn
more/less’ relative to each of the other subjects, and whether Economics graduates are more/less
likely ‘to get into professional roles’, holding other variables fixed (i.e. if Economics graduates
had the same tariff scores, the same socio-economic background, etc., with the graduates of the
other subjects).
Here are some important instructions/notes. Please read these very carefully:
(1) The dependent variable salary must be used in logarithmic form (i.e. the natural log
of salary). Note that the professional dependent variable is binary (taking value 0 for
‘non-professional’ and value 1 for ‘professional’). An example for an MLR model with a
binary dependent variable (that is, the Linear Probability Model) has been covered in the
live lecture of Week 11 .
(2) Your main explanatory variable (i.e. subject) is categorical, so it needs to be added in
the MLR model in the form of dummy variables. We have seen numerous examples of how
to include categorical variables in the MLR model using dummy variables from Week 9
onwards (both on the Asynchronous lecture notes/videos and on the Synchronous sessions).
(3) In your discussion of the results of your regression models, you need to provide an appropriate interpretation of the coefficients of the subject-related dummy variables. You need
to provide this interpretation:
(i) for models that don’t include any other explanatory variables. That is, in the models where you regress log(salary) or professional just on the subject-related dummy
variables. Let’s call this the ‘empty’ model.
(ii) for models that include the rest of the explanatory variables. See point (6) for instructions related to what other explanatory variables to include in your models. Let’s call
this the ‘full’ model.
(4) You also need to conduct hypothesis testing, to test:
(i) whether there is statistical evidence that the mean log of salary of Economics graduates
differs from the mean log of salary of graduates of each of the other subjects (i.e. using
the results of your ’salary’ model), and whether Economics graduates are more/less
likely to get into professional jobs (i.e. using the results of the ‘professional model’),
holding the other variables fixed. You can do this by commenting on the relevant pvalues obtained in Stata, and you need to do this for both the empty and the full
models.
(ii) whether there is evidence of joint significance of the subject dummy variables using Ftests (this was covered on Week 10), again for both the ‘salary’ and the ‘professional’
models, holding the other variables fixed. For the F-tests, please provide the obtained
2
F-statistics as well as the p-values. Note also that F-testing needs to be done only on
your full models.
(5) Following your interpretation of coefficients and hypothesis testing, also comment on how
much your results have changed by including the additional explanatory variables (i.e. how
much the effects of subject dummy variables on the outcome variables have changed, by
‘holding these characteristics/factors fixed’ across graduates), both in terms of magnitude
and statistical significance.
(6) It is up to you to decide which other explanatory variables you add to your ‘salary’ and ‘professional’ models, and it is not necessary that both the ‘salary’ model and the ‘professional’
model include the same variables. Note that categorical variables (such as degree class
or region, need to be added as dummy variables. For each explanatory variable that you
add, you need to offer a short justification on why it is important for these variables to be
included in the model (about 100-150 words for the justification of each variable). Note
that you need to present a single justification for both ‘Salary’ and ‘Professional’ models,
instead of a 100-150 words justification for ‘Salary’ and then another 100-150 words justification for ‘Professional’. Also, for the variables that you decide not to add to your model,
you also need to provide justification as to why these were not added (about 50-100 words
for each variable not added). For an example, please see the uploaded ‘Example of variable justification for Part B – a hypothetical example’ document, available in the SAMPLE
EXAMPLE folder.
(7) Note that variables tariff and age must be included in the full models.
– For tariff, you need to decide whether you use it in its linear form, or whether you
include a quadratic term / replace it by the natural log of tariff. Your choice needs
to be justified within your justifications above.
– For the variable representing the graduates’ age, it must be included in the model as a
quadratic function (i.e. add both age and age2
). You also need to provide two graphs,
one for the predicted log of salary against age, and one for the predicted probability
of getting into professional employment against age. Then, based on these graphs, you
need to discuss the relationship between age and salary/professional (in about 200-250
words overall).
– Note that, for tariff and age, you don’t need to explain why these variables have been
added to the model.
(8) Your ‘salary’ regression model needs to be tested for violation of MLR5 (i.e. whether there
is a heteroskedasticity problem). Conduct this test only for your full model and present
your test statistic as well as the p-value of this test. If there is statistical evidence of
heteroskedasticity, then the standard errors presented in your regressions must be made
‘robust to heteroskedasticity’. Also, if your model is made robust to heteroskedasticity,
then note that the hypothesis testing, under point (4), need to be done on the ‘robust’
model. Please note that testing for heteroskedasticity and correcting the standard errors
is covered in the material of Week 12. Also note that in your ‘professional’ regression
model, heteroskedasticity robust standard errors must be used (you don’t need to test for
heteroskedasticity in the ‘Professional’ model).
(9) Note that all your regression results need to be presented in one or two tables. There
are Stata commands that create such tables automatically, such as the ‘outreg2’ command.
This was discussed in the Support Session of Week 10 (an extract of the video recording
where I discuss this, can also be found in the ’Introduction to Stata Material’ section on the
module’s BB). My suggestion is to have one table with 5 columns. A first column for the
variables, two columns for the estimates of the ‘salary’ models (one for the empty and one
for the full model), and two columns for the estimates of the ‘professional’ models (again,
one for the empty and one for the full model).
3
(c) [9 Marks]
Provide theoretical justification of your main findings in this project. This discussion needs to
focus on the main topic of the project (i.e. ‘does study economics pay off relative to the other
subjects?’). You also need to provide up to three academic references as part of your justification.
Note that these academic references can be either published papers in academic journals or other
academic reports published by academic institutions (such as the Institute of Fiscal Studies).
Newspaper articles are not valid academic references. The answer to this part must be contained
within one side of a page.
(d) [9 Marks]
Identify the 3 most important problems/limitations in your models/results and explain:
(i) why these are important problems/limitations (in terms of affecting the reliability of your
estimated coefficients)
(i) how each of these problems/limitations could be addressed.
The answer to this part must be contained within one side of a page.
(e) [10 Marks]
Present the main findings of your regression analysis within a single graph. Note that this can
be a ‘combined graph’ and it can be produced either in Stata or Excel. A suggestion of what
kind of graph to create is provided in the video recording of the Project Discussion Sessions,
Session C (available in the Project section on BB). If the graph has been done in Stata, you
don’t need to provide your Stata command. Also provide a discussion/summary of the findings
presented in this graph and try to avoid using technical language (i.e. econometric terminology
that would not make sense to a non-specialist). The answer to this part must be contained
within one side of a page.
(f) [10 Marks]
Only for the ‘salary’ regression model, in your full model, add interaction terms between the
subject dummy variables and one of the explanatory variables. Estimate this model, present
your results and provide a discussion of the additional insights obtained following the model
with the interaction terms. Note that for this part, you can just copy/paste the Stata output
instead of creating your own table. Within your answer you also need to justify why you have
picked this explanatory variable for the interaction terms. The answer to this part must be
contained within two sides of a page, i.e. one full page.

  • END OF QUESTIONS –
    4

Hire a competent writer to help you with

ECO-5006A: Introductory Econometrics

troublesome homework