Below is a table with problem sets. Answers to your problem sets should be typed up, preferrably using Latex/Markdown. Hand in your problem sets through Canvas.

Problem sets with deadlines
PS # Deadline
1 Sep 29 at 12:30
2 Oct 6 at 12:30
3 Nov 1 at 12:30
4 Nov 17 at 12:30
5 TBD at 12:30

PS1 - Extremum estimation

This problem sets requires you to read Newey and McFadden (1994) to establish the consistency of maximum likelihood methods in general, and logit in particular.

In Newey and McFadden, you will find the consistency result for extremum estimators that we also discussed in class. Link the results in Newey and McFadden to the general extremum estimation results.

  1. What are the conditions on ML to get consistency?
  2. What are the conditions on the logit to guarantee consistency of the maximum likelihood estimator?

Pay particular attention to uniform convergence through the uniform law of large numbers, and to identification through the information inequality.

The answers are scattered throughout Newey and McFadden (for ML and probit, not logit). It is your goal to gather the scraps, organize/verify/complete/explain/extend them.

PS2 - Latent variables

This problem sets is about latent variable models.

  1. Ordered probit. Source: Wooldridge, 16.3. \(Y\) takes values in \(\left\{ 0,\cdots,J\right\}\). The underlying latent variable \(y^{*}\) is determined by \(y^{*}=x\beta+u,\,\,\,\left.u\right|x\sim\mathcal{N}\left(0,\exp\left(2x\delta\right)\right)\) and \[ y =\begin{cases} 0\text{ if }y^{*}\leq\alpha_{1} \\ 1\text{ if }\alpha_{1}<y^{*}\leq\alpha_{2} \\ \vdots \\ J\text{ if }y^{*}>\alpha_{J} \end{cases} \]
  • Derive the response probabilities \(P\left(\left.y=j\right|x\right)\)

  • Write down the log-likelihood as a function of all parameters, \(\alpha,\beta,\delta\).

  1. Unordered choice. For the multinomial logit model, the choice probabilities are given by \[P\left(\left.y_{i}=j\right|X_{i}\right)=\frac{\exp\left(x_{ij}\beta\right)}{\sum_{l=0}^{J}\exp\left(x_{il}\beta\right)}.\]
  • Assume that \(x_{ij}\) contains a constant for each \(j\). Is \(\beta\) identified? Explain. If it is not identified, suggest a normalization restriction so that the parameters of the resulting model are identified.
  • Obtain an expression for the “cross-marginal-effect” of a unit, c.p., change in the value of the k-th regressor for category j on the probability of choosing category \(m\neq j\), \[\frac{\partial P\left(\left.y_{i}=m\right|X_{i}\right)}{\partial x_{ij,k}}.\]
  1. Censored ordered logit. Consider the following model: There is one dependent variable, \(y\), and a vector of \(K\) covariates \(x\). Underlying the relationship between \(y\) and \(x\) is a latent variable model: \[y_{1}^{*} =x\beta+u_{1},\\ y_{2}^{*} =x\gamma+u_{2}.\] In this model, if \(y_{1}^{*}<0,\) then \(y=0\). However, if \(y_{1}^{*}\geq 0\), then \(y\) follows an ordered logit model with underlying variables \(y_{2}^{*}\). To be more precise, \[ y=\begin{cases} 0 & \text{if }y_{1}^{*}<0\\ 1 & \text{if }y_{1}^{*}\geq0\text{ and }y_{2}^{*}\leq\alpha_{1},\\ 2 & \text{if }y_{1}^{*}\geq0\text{ and }\alpha_{1}<y_{2}^{*}\leq\alpha_{2},\\ \vdots\\ J & \text{if }y_{1}^{*}\geq0\text{ and }\alpha_{J-1}<y_{2}^{*}. \end{cases} \] Assume that \(u_{1}\) and \(u_{2}\) are conditionally independent, and that \(u_{1}|x\sim LOG\left(0,1\right)\) and \(u_{2}|x\sim LOG\left(0,1\right)\).
  • Compute \(P\left(y=0|x\right)\).
  • Compute \(P\left(y=j|x\right)\) for \(j\not\in\left\{ 0,1,J\right\}\).
  • What is the effect of a ceteris paribus change in \(x_{k}\), the \(k\)th component of \(x\), on \(P\left(y=1|x\right)\)?
  • Describe the expression that you obtain, i.e. try to name the various terms in the expression. Do this in the context of an economic example, and give your interpretation in terms of that example.
  • What is the effect of a change in \(x_{k}\), the kth component of x, on \(P\left(y=j|x\right)\)? Assume that \(j\not\in\left\{ 0,1,J\right\}\). I am looking for the marginal effect.
  • Construct the log-likelihood function for estimation of \[\theta=\left(\beta,\gamma,\alpha_{1},\cdots,\alpha_{J-1}\right).\]
  • What would \[Q_{0}\left(\theta\right)=\text{plim }Q_{n}\left(\theta\right)\] be in this case? Is this objective function continuous in its parameters?
  • Discuss whether the assumptions for consistency of the extremum estimator will hold in this case. Start by stating clearly what the objective function is.

Practice questions

  1. What is the difference between the partial effect at the average (PEA) and the average partial effect (APE)? Use the binary choice model as a running example.
  • Describe the difference using mathematics.
  • Describe the difference using words.
  • Name one drawback of the PEA.
  1. Exercise 19.4 in Wooldridge (page 846).

  2. Exercise 19.11 in Wooldridge (page 847-848), subquestions (a) and (b).

  3. Censored tobit model Consider the censored Tobit model, \(y_{i}^{*} =x_{i}\beta+u_{i},\,\left.u_{i}\right|x_{i}\sim\mathcal{N}\left(0,\sigma^{2}\right)\), with \[y_{i} =\begin{cases} y_{i}^{*} & \text{ if }y_{i}^{*}\geq0,\\ 0 & \text{ if }y_{i}^{*}<0, \end{cases}\] where \(x_{i}\) is a \(K-\)dimensional vector of explanatory variables, \(y_{i}^{*}\) is a latent variable, and \(y_{i}\) is a dependent variable.

  • Derive an expression for \(E\left(\left.y_{i}\right|x_{i}\right)\) and show that \(E\left(\left.y_{i}\right|x_{i}\right)\geq E\left(\left.y_{i}^{*}\right|x_{i}\right).\)
  • Now, use it to compute the marginal effect \[\frac{\partial E\left(\left.y_{i}\right|x_{i}\right)}{\partial x_{ik}},\] where \(x_{ik}\) is the \(k-\)th element of \(x_{i}\)
  • Can you determine the sign of the marginal effect?
  • How would you estimate \(\beta\)? Give enough detail so that a first-year PhD student could program the estimator. Note: read the next question before you start writing your answer.
  • Describe the asymptotic distribution of the estimator you propose in the previous question.

PS3 - Semi- and nonparametrics

You observe a random sample \((Y_i,X_i)\) on a scalar dependent variable \(Y\) and a scalar explanatory variable \(X\). The true DGP has \[ Y = \beta_0 + \beta_1 X + u\] and \(E(u|X)=0\). Show that the local linear estimator is unbiased.

PS4 - Program evaluation

  1. Choose a paper from the American Economic Review or the American Economic Journal: Applied (or a top field journal, check with me) that uses any of the methods discussed during the program evaluation lectures. A more advanced method is acceptable, too.1 The paper must be published in 2010 or later. For the paper that you select, answer the questions below. For this assignment, I would encourage you not to work in groups.
  • Give a short summary of the paper. In particular, discuss the research question, data used, and findings of the authors.
  • Discuss why the authors used the method that they did. Remind the reader about the details of the method used. In particular:
    • what precisely is the object that they are estimating;
    • what is the underlying model;
    • what are the required assumptions for the method to be consistent for the object in (a)?
  • Give arguments for why the method that was used is not appropriate. Which assumptions can you argue to be violated?
  • Speculate how another method (possibly requiring a slightly different data set) can be used to answer the same question. For example, if your chosen paper uses propensity score matching, explain how a difference-in-difference approach would work to answer the same question. Which method do you prefer?
  1. The following exercise is inspired by Rohini Pande’s problem sets in EC730 at Yale. In 1996, the Chicago Public Schools instituted an “accountability policy” that tied summer school and promotional decisions to performance on standardized tests. Third graders must obtain a minimum score of 2.8 in both reading and math achievement on the Iowa Test of Basic Skills (ITBS) in order to advance. Someone who received 2.8 on reading but 2.7 on math had to go to remedial summer school. Assume data on the cohort of students who were in the third and sixth grade from the 1993-1994 school year to the 1998-1999 school years is available. We are interested in estimating the treatment effect of remedial summer school \[Y_{i,t+1}=X_{it}\beta+D_{it}\tau+u_{i}+\epsilon_{i,t+1}\] where \(Y\) is the outcome, \(X\) is a vector of demographic and past performance variables, and \(D\) is a binary variable that takes on a value of one if a student went to summer school and zero. Furthermore, \(u_{i}\) is a fixed effect that captures time-invariant student characteristics (like ability), and \(\epsilon_{i,t+1}\) is an error term.
  • What is the problem with doing OLS?
  • How could you use the policy described to design an RDD evaluation?
  • What exactly does the RDD method identify. Be precise. Use math. Use (conditional) expectations.
  • Is it fuzzy or sharp RDD?
  • Assume it is sharp. Under what assumptions is the RDD valid, i.e. under what conditions does the RDD method consistently estimate the parameter that you defined under (3)?
  1. Mining in Peru. This question is about the paper “The Persistent Effects Of Peru’s Mining Mita” by Melissa Dell, Econometrica 78 (6).
  • How does the geographical discontinuity in this paper differ from the standard RDD setup discussed in class?
  • Explain why it is useful to have the results in columns (1)-(3) in Table III. How do you interpret the differences in results across the columns?
  • Consider an alternative approach to estimating the long-run mita effect that uses a propensity score approach with “distance to the Mita boundary” as one of the propensity score covariates. What are the advantages and disadvantages with respect to the RDD approach taken in this paper? Thinking about data requirement, the identifiable parameter, and the required identifying assumptions.
  1. In the course, we used the potential outcomes framework in the context of program evaluation. We can also use this framework to think about missing data issues. Let \(Y_{i}\left(1\right)\) denote the random variable that we would like to estimate the mean of, i.e. we are interested in \(\mu_{1}=E\left(Y_{i}\left(1\right)\right)\). The binary random variable \(D_{i}\in\left\{ 0,1\right\}\) is an indicator that is equal to 1 if we observe \(Y_{i}\left(1\right)\) and that is equal to zero if we do not observe it. Define \(Y_{i}=D_{i}Y_{i}\left(1\right)\) and assume that a random sample of \(\left\{ \left(Y_{i},D_{i}\right),\,i=1,\cdots,n\right\}\) is available to the researcher.
  • Assume that \(D_{i}\perp Y_{i}\left(1\right)\). Show that \(\bar{Y}=\sum_{i}Y_{i}\big/\sum_{i}D_{i}\) is a consistent estimator for \(\mu_{1}\).
  • Assume that there is a random vector \(X_{i}\) such that \(\left.D_{i}\perp Y_{i}\left(1\right)\right|X_{i}\). Assume that you have access to a random sample \[\left\{\left(Y_{i},D_{i},X_{i}\right),\,i=1,\cdots,n\right\}.\] In this setting, how would you estimate \(\mu_{1}\)?

PS5 - Panel data

[TBD]

Question 1. Democracy and income

This question is about a paper by Acemoglu, Johnson, Robinson, and Yared, titled “Democracy and Income”. You can find it through Acemoglu’s website http://economics.mit.edu/faculty/acemoglu/paper. Read the paper and answer the following questions:

  1. In this paper, what does the unobserved heterogeneity \(c_{i}\) capture?

  2. Based on your answer to (1), is it reasonable to make the random effects assumption?

  3. For this paper, is a strict exogeneity assumption for the covariates reasonable? Give an argument that undermines strict exogeneity.

  4. Answer the same question for sequential exogeneity.

  5. The authors say: “While the fixed effects estimation is useful in removing the influence of long-run determinants of both democracy and income, it does not necessarily estimate the causal effect of income on democracy.” Explain this statement.

Question 2. Immigrants’ Labor Supply and Exchange Rate Volatility

This question is based on Arash Nekoei (2013, AEJ: Applied, “Immigrants’ Labor Supply and Exchange Rate Volatility”). The author considers the relationship between the labor supply decision of immigrants to the U.S. and the real exchange rate of the immigrants home country vs. the United States. The idea: the immigrant transfers some of her US earnings to relatives in her home country. If the exchange rate worsens, that remittance will be less valuable, and this may affect the number of hours she chooses to work. We use a model that is slightly different from that in the paper: \[Z_{c,t}=\delta_{t}+\delta_{c}+E_{c,t}\beta+X_{c,t}\theta+u_{c,t},\] where \(c\) stands for home country (73 countries), and \(t\) stands for time (years: 1994-2011). \(Z\) and \(E\) represent the log of earnings and the real exchange rate, and \(X\) is a vector of observables, including “years since arrival”, gender, education, and marital status. These variables represent averages across immigrants from a given home country. \(\delta_{t}\) and \(\delta_{c}\) are time and country dummies.

  1. What do the time and country dummies capture?

  2. Is it appropriate to use a random effects esti and words.

  3. In the notation of this model, state the assumption of “strict exogeneity conditional on the unobserved effect”. Do you think “strict exogeneity conditional on the unobserved effect” holds in the setting of this paper? Why/why not?

  4. Same as (3), for sequential exogeneity.

Question 3. Dynamic panel data.

The Arellano-Bond estimator for the linear dynamic panel data model works by first-differencing \[y_{i,t}=\alpha_{i}+\rho y_{i,t-1}+u_{i,t}\] and then using \(y_{i,t-2}\),~\(y_{i,t-3}\) as instruments for the differenced equation.

This is a GMM estimator based on moment conditions \(E(Z_{i}'\Delta u_{i})=0\), where \[\Delta u_{i}=\left(\begin{matrix}u_{i,2}-u_{i,1}\\ u_{i,3}-u_{i,2}\\ \vdots\\ u_{i,T}-u_{i,T-1} \end{matrix}\right).\]

  1. Assume \(T=5\). Construct the matrix \(Z_{i}\) that leads to the Arellano-Bond estimator.

  2. Explain why this only works (i.e. the estimator is only consistent) when there is no serial correlation in the error terms \(u_{i,t}\).

  3. What do you do when there is serial correlation in the error terms? In other words: how do you modify the Arellano-Bond estimator to allow for serial correlation in the error terms?

Question 4. Nonlinear panel data. [TBD]


  1. In that case, scratch question (4) below, and replace it by a more detailed explanation of the methods used by the author(s).