---
title: "Program evaluation"
output:
beamer_presentation:
toc: true
slide_level: 2
fig_width: 3
fig_height: 3
header-includes:
- \usepackage{amsmath}
- \usepackage{amssymb}
---
# Introduction
## Readings
- Imbens and Wooldridge
- precise readings: website
- Lecture 1 (2 hours)
- through propensity score methods
- Lecture 2 (1 hour): differences-in-differences
- Lecture 3 (1 hour): LATE
- Lecture 4 (2 hours): RDD
## Introduction
- Program evaluation: incredible growth in the last decades
- Theoretical advances,
- Widely used in empirical economics
- labor
- development
- Theoretical advances started in biostatistics.
- Philosophy + notation different from regression
## Goal
- **Goal** of program evaluation is to quantify the effect that an intervention has on some outcome.
- The philosophy behind program evaluation starts from the "scientific ideal" of the **randomized control trial** (RCT).
- Common **tools** are
- RCT
- Differences-in-differences
- Matching
- Propensity score methods
- Instrumental variable methods
- Regression discontinuity design
# RCTs
## Setup
1. Randomly draw a sample of individuals from the study population
2. Randomly **assign** each individual to
- **treatment group**
- **control**
3. Apply treatment
4. Wait
5. Compare outcomes in treatment group to control group
## Applicability
If you can design an RCT that answers your research question: great.
However:
1. May be unethical
- smoking and birthweight
2. It may be too costly
- I may want to research whether buying grad students a Ferrari increases seminar attendance
3. External validity
## Selection
- Results of an RCT are believable because individuals are **randomly assigned** to treatment.
- Without an RCT, we worry about **selection**
1. Selection by the researcher:
2. **Self-selection:**
- people who select into receiving the treatment are different from those who choose not to select into the treatment
## Self-selection in economics
As economists, we are very concerned with **self-selection**.
- people who are motivated to take up treatment have different preferences than those who do not.
- preferences are likely to influence decisions in other dimensions as well
- Program evaluation methods confront the self-selection problem
# Examples
## Examples
What follows are several examples of empirical questions tackled with various program evaluation methods.
## Example (1): RCT
- Burde and Linden (2013), "Bringing education to Afghan Girls", AEJ: Applied 5 (3).
- In Ghor (province in Northwest of Afghanistan), only 29% of families live within 5km of a primary school.
- Authors look at 31 villages in Ghor.
- In 13 of these, randomly selected, villages, primary schools are built.
- One year after the schools are constructed, the authors return to obtain data.
Question of interest: _How does the presence of a primary school within 5k / in your village affect school enrollment, and test scores?_
## RCT 1, Findings
(for girls)
- Treatment increases enrollment by 52%
- Conditional on enrollment, testscores are up by 1.28 sd.
\bigskip
Q: How did the authors compute these findings?
Q: Do we believe these findings? Why?
## RCT 1, Issues
1. Ethical?
2. Costs? (this is world bank stuff)
## Example 2: Skytrain
To examine the effect of the existence of a Skytrain station within walking distance from a housing unit on the price of that housing unit...
Compare values of the houses close to Skytrain to value of houses not close to Skytrain.
- Q: What do we think the effect is?
- Q: What could be the selection effect?
- Why was the Skytrain station built where it is built?
- Q: What would we find by comparing means?
- Q: Solution?
## Example 2: Tool
Differences-in-differences.
## Example 3: Labor market programs
Labor market training programs were the first policy interventions that were systematically analyzed using a program evaluation perspective by economists.
- Government cares about people who are not doing well on the labor market. To increase the prospects, they design a program that offers
- interview training
- job-specific training
- work experience by means of subsidized jobs
- Eligibility: all individuals who are on the labor market, and who have been unemployed for $>6$ months.
- Treatment group: Eligible ones that applied
## Example 3: Tool
When we compare the means of those who enrolled and who did not enroll, what are we estimating?
- Issue: self-selection.
- Tool: propensity score matching
## Example 4: Texting bans
- Existing literature:
$$P(death|driving+phone) = 4 \times P(death|driving + nophone)$$
- People continue to text. Why?
$$ Y_{i,m} = \alpha_i + \delta_m + X_{im}\beta + \omega B_{im} + u_{im} $$
where:
- $i$ is state, $m$ is month
- $Y$ is (log of) traffic fatalities
- $X$ includes
- population
- proportion male
- unemployment
- gas tax
- $B$: is a texting ban in place?
## Example 4: Texting bans (2)
- What's in $\alpha_i$?
- Correlated with $X$?
- Correlated with $B$?
## Example 4: Texting bans (3)
Finding: $\hat{\omega} = -0.0374.$
- Interpret this finding.
Details:
- No effect for "weak bans"
- No effect except for single-occupancy vehicles
- Effect starts when findings are announnced, disappears four months after ban in effect
## Example 4: Tool
- Issue: unobservables are correlated with treatment status
- Tool: Fixed effects panel data methods.
## Example 5: Vietnam draft lottery
- Issue: self-selection
- Tool: instrumental variables.
# Rubin causal model + RCT
## RCM
We will use the **Rubin Causal Model** (or: potential outcomes framework) to talk about:
- underlying random variables
- available data
- parameters of interest
- modelling assumptions
We look at this first in the context of an RCT.
## Potential outcomes
Key modelling innovation in the RCM is the distinction between potential and observable outcomes.
For every individual, we have
- Potential outcomes:
- $Y(1)$: observed outcome if treatment were applied
- $Y(0)$: observed outcome if no treatment applied
- Treatment status:
- $D \in \{0,1\}$: 1 if treated, 0 if not
## Available data
For each individual, we observe:
- Observed outcome:
- $Y = D Y(1) + (1-D)Y(0) = Y(D)$
- Treatment status $D$
We assume a random sample $((Y_i,D_i),~i=1,\cdots,n)$, and assume $n \to \infty$
## Parameter of interest
- We would like the distribution of $Y(1)-Y(0)$
- Infeasible unless strong assumptions are imposed
- Instead: target $ATE=E(Y(1)-Y(0))$
- Other targets possible
## Infeasible estimator
The "sample analog" of the ATE is
$$ \hat{ATE} = \frac{1}{n} \sum_i (Y_i(1)-Y_i(0)) $$
Not feasible, since we only observe one of $(Y_i(1),Y_i(0))$ for each $i$. Program evaluation setup is reminiscent of missing data setup.
## RCT Assumptions
**Randomization.**
$$(Y(1),Y(0)) \perp D$$
\bigskip
**Overlap.** $P(D=1) \in (0,1)$
## Identification
Under the randomization assumption,
$$
\begin{aligned}
ATE &= E(Y(1)-Y(0)) \\
&= E(Y(1)) - E(Y(0)) \\
&= E(Y(1)|D=1) - E(Y(0)|D=0) \\
&= E(Y|D=1) - E(Y|D=0)
\end{aligned}
$$
## Feasible estimator
The sample analog for terms are available, so a feasible estimator is
$$ \hat{ATE} = \frac{\sum_i D_i Y_i}{\sum_i D_i} -
\frac{\sum_i (1-D_i) Y_i}{\sum_i (1-D_i)}$$
## Overlap
Why did we need the overlap condition?
## What next?
For each of the methods that follows, we will:
1. Use ~RCM to model available data
2. Define parameter of interest
3. Introduce assumptions
4. Show that (2) is identified under (1)+(3)
5. Discuss estimation
## Takeaways
For a given research question, (1) + your ability to defend (3) determines whether you can get (2).
\bigskip
or
\bigskip
Any of the methods that we discuss below is a reasonable estimator (consistent) only if
- you have access to (1),
- are willing to assume (3), and
- are interested in (2).
## Parameters of interest
Potential parameters of interest include:
- ATE: $E(Y(1)-Y(0))$
- ATT: $E(Y(1)-Y(0)|D=1)$
- CATE: $E(Y(1)-Y(0)|X=x)$, for some RV $X$
- ..., see Imbens and Wooldridge, Section 3
# Propensity score
## Potential outcomes
For every individual, we have
- Potential outcomes:
- $Y(1)$: observed outcome if treatment were applied
- $Y(0)$: observed outcome if no treatment applied
- Treatment status:
- $D \in \{0,1\}$: 1 if treated, 0 if not
- Confounders $X \in \mathcal{X} \subset \mathbb{R}^k$
## Available data
For each individual, we observe:
- Observed outcome:
- $Y = D Y(1) + (1-D)Y(0) = Y(D)$
- Treatment status $D$
- Confounders $X \in \mathcal{X} \subset \mathbb{R}^k$
We assume a random sample $((Y_i,D_i,X_i),~i=1,\cdots,n)$, and assume $n \to \infty$
## Parameter of interest
$$ATE=E(Y(1)-Y(0))$$
Alternatively, could estimate the
## Mean comparison
Why is
$$ \hat{ATE} = \frac{\sum_i D_i Y_i}{\sum_i D_i} -
\frac{\sum_i (1-D_i) Y_i}{\sum_i (1-D_i)}$$
not a good estimator, in general?
## IPW Assumptions
**Unconfoundedness.**
$$(Y(1),Y(0)) \perp D$$
\bigskip
**Overlap.** $p(x)\in(0,1) \forall x \in \mathcal{X}$, where
$$p(x) \equiv P(D=1|X=x)$$
## Interpretation
- Unconfoundedness:
1. local experiment at every $x \in \mathcal{X}$
2. individual gain $= f(X) + u,~u \perp X$
- Overlap:
1. use (1) above, and intuition from RCT
- Example: Malmendier and Tate (QJE, 2005), "Superstar CEO's"
## Identification
Under the randomization assumption,
$$
\begin{aligned}
E\left[\frac{DY}{p(X)} \right] &=
E\left[\frac{DY(1)}{p(X)} \right] \\
&= E_X\left[E_{(Y(1),D|X)}\left[\left. \frac{DY(1)}{p(X)} \right|X \right]\right] \\
&= E_X\left[\frac{1}{p(X)}E_{(Y(1),D|X)}\left[DY(1) | X\right]\right] \\
&= E_X\left[\frac{1}{p(X)}E_{(D|X)}\left[D | X\right]E_{(Y(1)|X)}\left[Y(1) | X\right]\right] \\
&= E_X\left[E_{(Y(1)|X)}\left[Y(1) | X\right]\right] \\
&= E(Y(1))
\end{aligned}
$$
First equality follows from $DY = D(DY(1)+(1-D)Y(0))$ and $D(1-D)=0$.
## Identification (2)
Similarly for the control outcome. Put together:
$$ E\left[\frac{DY}{p(X)} - \frac{(1-D)Y}{1-p(X)} \right]
= E(Y(1)-Y(0)).$$
We have translated
- the expression on the RHS, for which no sample analog is available
- using the assumptions
- into a quantity for which we can obtain the sample analog
? How to obtain the sample analog for $p(x)$?
## Feasible estimator
1. Use NP to estimate $\hat{p}(x)$
2. $\hat{ATE} = \frac{1}{n} \sum_i \left[\frac{D_iY_i}{\hat{p}(X_i)} - \frac{(1-D_i)Y_i}{1-\hat{p}(X_i)} \right]$
This is called the _IPW estimator_
## Properties
This is a semiparametric estimator. Hirano, Imbens, Ridder (2003, ECTA) show that
1. This estimator is consistent and asymptotically normal (a la Newey, 1994)
2. The estimator is efficient **even when** $p(x)$ is known, or if it is know that $p$ is logistic
- _IPW_ paradox
- see Hahn (1998); Graham (2011)
- the NP estimator implicitly uses **all** the moment conditions implied by the unconfoundedness assumption
## Other estimators
- Many other consistent estimators for the ATE and the CATE have been proposed, see Imbens and Wooldridge (2009)
- Recommendation: doubly robust estimators, see e.g. Firpo and Rothe (2015); Chernozhukov, Escanciano, Ichimura, Newey (2016)
Finally: strict overlap, Khan and Tamer (2010, ECTA)
# DID
## DID
6. Assumptions
7. Identification
8. Estimation
- analog estimation
- OLS
## Graphical
Whiteboard: ``sketch_61.png``
- horizontal difference: "other things" that changed over time
- vertical difference: selection
- after conditioning on $X$: selection on unobservables
- why may IPW not work?
## Potential outcomes
For every individual, we have
- Potential outcomes:
- $Y_0(1)$: period-0 observed outcome if treatment were applied
- $Y_0(0)$: period-0 observed outcome if no treatment applied
- $Y_1(1)$: period-1 observed outcome if treatment were applied
- $Y_1(0)$: period-1 observed outcome if no treatment applied
- Treatment status:
- $D \in \{0,1\}$: 1 if treated, 0 if not
- applies to both periods
- $D_0 = D_1 = D$
- $D_0:$ "Will you be treated?"
- All that follows can be conditional on $X$
## Available data
For each individual, we observe:
- Observed outcome:
- $Y_t = D Y_t(1) + (1-D)Y_t(0) = Y_t(D)$ for $t = 0,1$
- Treatment status $D$
We assume a random sample $((Y_i,D_i,X_i),~i=1,\cdots,n)$, and assume $n \to \infty$
## Available moments
The available data above implies:
Moment | Observed
------ | --------
$E(Y_0(0)|D=0)$ | Yes
$E(Y_0(0)|D=1)$ | No
$E(Y_0(1)|D=0)$ | No
$E(Y_0(1)|D=1)$ | Yes
$E(Y_1(0)|D=0)$ | Yes
$E(Y_1(0)|D=1)$ | No
$E(Y_1(1)|D=0)$ | No
$E(Y_1(1)|D=1)$ | Yes
Table: Available moments.
## Parameter of interest
$$ATT=E(Y_1(1)-Y_1(0)|D=1)$$
Notes:
1. First term is observable
2. Under conditioning, you can get $CATT(x)=E(Y_1(1)-Y_1(0)|D=1,X=x)$
## Approach
Make assumptions that allow us to express the counterfactual
$$E(Y_1(0) | D=1)$$
in terms of available moments.
## DID Assumption 1
**Parallel paths.**
$$ E(Y_1(0)- Y_0(0) | D=0) = E(Y_1(0)- Y_0(0) | D=1)$$
## Interpretation
If the treatment never happens, the **change** in outcomes is mean-independent of treatment assignment.
\bigskip
Rules out
1. Macro shocks that affect groups differentially
2. Concurrent programs whose assignment is correlated with $D$
3.
## What is missing?
We need one additional assumption: an assumption on the differences does not pin down the counterfactual, which is in levels
We need an assumption on period-0 counterfactual outcomes for the treatment group
## DID Assumption 2
**No anticipation.** $$E(Y_0(0)|D=1) = E(Y_0(1)|D=1)$$
Interpretation: see [Heckman and Smith 99](https://athens.src.uchicago.edu/jenni/dvmaster/FILES/ash_dip.pdf) for a discussion of the _Ashenfelter dip_
## Identification
Assuming parallel paths and no anticipation
$$
\begin{aligned}
E(Y_1(1)-Y_1(0)|D=1) &= E((Y_1(1)-Y_1(0)) - (Y_0(1)-Y_0(0)) |D=1) \\
&= E((Y_1(1)-Y_0(1)) - (Y_1(0)-Y_0(0)) |D=1) \\
&= E(Y_1(1)-Y_0(1)|D=1) - E(Y_1(0)-Y_0(0) |D=0)
\end{aligned}
$$
First equality follows from no anticipation, second from rearranging, third from parallel paths.
## Estimator
Under additional linearity assumptions, we have:
$$ E(Y_{it}|D_i,t) = \beta_0 + \beta_1 D_i + \beta_2 t + \beta_3 D_it $$
which implies
$$
\begin{aligned}
E(Y_{it}|D_i=1,t=1) &= \beta_0 + \beta_1 + \beta_2 + \beta_3 \\
E(Y_{it}|D_i=1,t=0) &= \beta_0 + \beta_1 \\
E(Y_{it}|D_i=0,t=1) &= \beta_0 + \beta_2 \\
E(Y_{it}|D_i=0,t=0) &= \beta_0 \\
\end{aligned}
$$
so that $ATT=\beta_3$! This gives an easy way to compute the ATT, and allows for incorporation of time-varying control variables $X_{it}$.
## Estimation (2)
Alternative: nonparametric estimation of $E(Y_{it}|D_i,t,X_i)$, i.e. four functions ($D_i \in \{0,1\},t \in \{0,1\}$) of $x$.
A model more general than DID that uses this approach is [Athey and Imbens 06](http://onlinelibrary.wiley.com/doi/10.1111/j.1468-0262.2006.00668.x/abstract)
## Application
Card and Krueger(1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania".
# IV + LATE
## PE Overview
For random assignment and unconfoundedness, we used cross-sectional (conditional) and random assignment assumptions to estimate (C)ATEs.
Without unconfoundedness, we need more data, make replacement assumptions, or target less ambitious estimands.
## PE Overview (2)
- DID
- need $T>1$
- need parallel paths and no anticipation
- can only get ATT
- IV
- need instrument
- need exclusion restriction
- can only get LATE
- RDD
- need a discontinuity
- can only get $CATE(X=c)$
## IV + LATE
Written notes.
# RDD
## Example: Beland (AEJ, 2015)
\includegraphics[scale=0.32]{Beland-RDD.png}
_Does party affiliation of the governor affect the labor market?_
- $Y$: Total hours worked per year
- $X$: Margin of victory of democratic gubernatorial candidate
- Democratic governor if $X \geq 0$
## Example: Shigeoka (AER, 2014)
\includegraphics[scale=0.32]{Shigeoka-RDD.png}
_Does cost sharing affect health care utilization?_
- $Y$: Number of outpatient visits
- $X$: Age
- Sharply lower cost sharing if $X \geq 70$
## Example: Lee (2007)
\includegraphics[scale=0.45]{Lee-RDD.png}
_Is there an incumbency advantage in elections?_
## Example: minimum drinking age
\includegraphics[scale=0.50]{Drinking-RDD.png}
## Example: others
\includegraphics[scale=0.19]{papers-using-RDD.png}
## Example: others
\includegraphics[scale=0.19]{papers-using-RDD-2.png}
## Example: others
\includegraphics[scale=0.19]{papers-using-RDD-3.png}
## Example: others
\includegraphics[scale=0.19]{papers-using-RDD-4.png}
## OLS
In above examples, linear regression of $Y$ on $X$ can be severely biased for the causal effect of $X$ on $Y$ due to endogeneity.
Why?
1. Drinking age
2. Democratic governors
## Potential outcomes
$$(Y(0),Y(1),X)$$
- $Y(0),Y(1)$ are potential outcomes
- $X$ is called the running (forcing) variable
Note:
- other variables may influence $Y$
## Available data
- Data:
$$(X, Y(D))$$
- Institutional setting: clear cutoff values: $$D=1\{X\geq c\}$$ for **known** $c$.
## Parameter of interest
We can only hope to estimate the causal effect **at the cutoff**:
$$\tau \equiv E ( Y(1) - Y(0) |X = c)$$
## Assumption
Let
$$
\begin{aligned}
\mu_0(x) &= E(Y(0)|X=x) \\
\mu_1(x) &= E(Y(1)|X=x)
\end{aligned}
$$
be the potential outcome equations.
**Continuity.** $\mu_0$ and $\mu_1$ are continuous in $x$.
## Assumption (graph)
``sketch62.png``
- What can be observed?
- No restrictions on $\mu_1,\mu_0$
- except: smoothness
- consequence: estimated effect is local
## Identification
Under **continuity**
$$
\begin{aligned}
\tau &= E ( Y(1) - Y(0) |X = c)\\
&= \lim_{x \downarrow c} E(Y|X=x) - \lim_{x \uparrow c} E(Y|X=x)
\end{aligned}
$$
because
$$Y = 1\{X \geq c\}Y(1) + 1\{X < c\}Y(0)$$
## Estimation (1)
1. Estimate $Y_i = \beta_{0,0} + \beta_{1,0} (X_i -c) + u_i$ using $D_i = 0$ i.e. $X_i \leq c$
- yields $\hat{\beta}_{0,0}$
- estimator for $\lim_{x \uparrow c} E(Y|X=x)$
2. Estimate $Y_i = \beta_{0,1} + \beta_{1,1} (X_i -c) + u_i$ using $D_i = 1$ i.e. $X_i > c$
- yields $\hat{\beta}_{0,1}$
- estimator for $\lim_{x \downarrow c} E(Y|X=x)$
3. Estimated: $\hat{\tau} = \hat{\beta}_{0,1} - \hat{\beta}_{0,0}$
## Estimation (1): Danger
from Waldinger's [notes](https://www2.warwick.ac.uk/fac/soc/economics/staff/ffwaldinger/teaching/ec9a8/slides/lecture_4_-_rdd.pdf)
\includegraphics[scale=0.42]{Waldinger-RDD.png}
## Estimation (2)
- Local linear estimation
- rate-optimal, see [Porter, 2003](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.540&rep=rep1&type=pdf)
- Choosing $h$
- Why not use standard CV?
- Imbens and Kalyanaraman (2012, REStud)
## RDD issue (1): Manipulation
RDD assumes an experiment at $X=c$:
> transparent rules with criteria based on clear cutoff values, rather than on discretion of administrators [Imbens]
Problem: manipulation of the running variable.
## RDD issue (1): Example
> [...] study the impact of summer school and
grade retention on test scores, where the treatments depend discontinuously on separate pre-tests. In that context, because the treatment assignment rule is public knowledge, it is possible that those grading the pre-test would be motivated to influence a student's treatment assignment by strategically mismeasuring the student's actual score. [McCrary (JoE, 2007) about Jacob and Lefgren (REStat, 2004)]
## RDD issue (1): Example
> Hahn et al. (1999) study the impact of equal employment opportunity laws on employment of racial minorities, taking advantage of the fact that the 1964 Civil Rights Act, as amended, covers only those firms with 15 or more employees. Employers presumably maintain perfect control over labor inputs. This raises the possibility that a firm owner with a taste for discrimination, who would otherwise find it profit-maximizing to employ 15, 16, or 17 employees, for example, would elect to employ 14 employees to preclude the possibility of litigation alleging violations of the Civil Rights Act (cf., Becker, 1957). [McCrary (JoE, 2007)]
## RDD issue (1): Visually
(from McCrary, Joe, 2007)
\includegraphics[scale=0.42]{mccrary-test.png}
## RDD issue (1): Test
Density test.
## RDD issue (2): Other changes
Age $\geq 21$: does anything else change sharply at 21?
## RDD remarks
- No overlap!
- No external validity
## FRDD: Example
Fuzzy RDD: Angrist and Lavy (99)
\includegraphics[scale=0.32]{AngristLavy-FRDD.png}
## FRDD: Example (2)
vanderKlaauw (2002): financial aid
\includegraphics[scale=0.45]{vdK-FRDD-2.png}
## FRDD: Example (2) cont'd
\includegraphics[scale=0.45]{vdK-FRDD-1.png}
## FRDD: Example (3)
Buser, T. (2015), "The effect of income on religiousness", AEJ: Applied.
- $Y$
- church attendance
- $X$
- wealth
- cutoff: certain percentile
- treatment
- at cutoff, eligible for transfer
- fuzzy: not all families collect transfer
## FRDD: Graphical
``sketch62.png``, FRD.
## FRDD: Setup
Denote the propensity score by
$$p(x) = P(D=1|X=x).$$
FRDD requires
$$
p^-(c) \equiv \lim_{x \downarrow c}p(x)
\neq
\lim_{x \uparrow c}p(x) \equiv p^+(c)
$$
In RDD, $$0 \neq 1$$
## FRDD: Observed
You observe
$$(Y,X,D)$$
and you know $c$.
Therefore, you know $p^-(c)$, $p^+(c)$, and
$$
\begin{aligned}
\mu^-(c) &= \lim_{x \uparrow c} E(Y|X=x) \\
&= p^-(c) \mu_1(c) + (1-p^-(c)) \mu_0(c) \\
\mu^+(c) &= \lim_{x \downarrow c} E(Y|X=x) \\
&= p^+(c) \mu_1(c) + (1-p^+(c)) \mu(0)
\end{aligned}
$$
## FRDD: Identification
It follows that
$$ \mu^+(c) - \mu^-(c) = (\mu_1(c)-\mu_0(c)) \times (p^+(c)-p^-(c))$$
- two equations in two unknowns
- observed jump is diluted by non-1 jump in probabilities
## FRDD: Estimation
- Estimate four quantities using LL
- Or: use instrumental variables
- Say $E = 1\{X \geq c\}$
- local to $X_i=c$, $E$ is an instrument for $D$
-interactions of $(E,X)$ are instruments for $D$
## FRDD: Estimand
Using the IV analog, Hahn, Todd, vanderKlaauw (ECTA, 2001) show that estimand is LATE.
- Assignment: $E$
- Treatment: $D$
# RKD
## RKD: Example
from Fabian Waldinger's [notes](https://www2.warwick.ac.uk/fac/soc/economics/staff/ffwaldinger/teaching/ec9a8/slides/lecture_4_-_rdd.pdf)
\includegraphics[scale=0.38]{Waldinger-RKD.png}