---
title: "Probability theory"
output:
beamer_presentation:
toc: true
slide_level: 2
header-includes:
- \usepackage{amsmath}
---
# Readings
## Readings
Probability theory is covered in Appendix B.
- Verify that you know the contents of Appendix B, but skip:
- in B.3: "Skewness and kurtosis"
- in B.1: "Continuous random variables" (will be covered next week)
- B.5: "The Normal and Related distributions" (Normal will be covered next week, so consider it part of next week's reading)
# Quiz
## Goal
This week's lectures will deal with probability theory. This lecture will be like your BUEC 333 review of probability theory. However, we will go a bit faster. There, I would spend a lot of time on cultivating intuition, examples, etc.
Here, the goal is to go straight for the necessary tools for the remainder of this course. The key concept that we will need is how to work with expectation and conditional expectations.
Refer to Appendix B for any gaps that you are missing. And: ask me! Please let me know if I went too fast, if you need more context, etc. On top of the upcmoing homework, try the exercise in Wooldridge. I can give you more practice material if you feel you need a refresher.
## Outline
- Goal:
- Review probability theory concepts
- as quickly as possible
- focus on discrete random variables
- Focus on conditional expectations
- Strategy:
- Solve problems to check/update your understanding
## Problem 1
You are going to throw two dice. Let $X_{1}$ be the number of eyes showing on the first die, and $X_{2}$ be the number of eyes on the second die. The random variable $Y$ to be the sum of the dice: $Y=X_{1}+X_{2}$
1. Write down the sample space of $Y$
2. Write down the conditional pdf of Y given $X_{1}\leq 2$
3. What is the marginal cdf of $Y$? Give the function, and draw a picture.
4. Find $E[X_{2}|Y=y]$
5. Find $Cov\left(X_{1},Y\right)$
## Problem 2
[update for 2017]
# Basic concepts
## Discrete random variables
DISCRETE RANDOM VARIABLES
In the remainder of today's lecture, we will work with discrete random variables.
To do this, we need to define what a discrete random variable is. This would require me to outline a theory of measure, probability, in which there is a sample space Omega, and a random variable is a function that assigns to any event in Omega a value in R: A random variable is a function.
In this course, we will take some serious shortcuts. We will go with an intuitive understanding that a (discrete) random variable is a mathematical object that can take a certain number of outcomes with certain probabilities. We do not (yet) know which outcome it is going to take. For example:
- tomorrow's weather,
- outcome of a coinflip,
- wage of the next randomly selected persion
We associate with a (discrete) random variable X a set of k possible outcomes {x_1,.,x_k}. With each outcome is associated a probability P(X=x_j)=f_x(x_j) for all j=1,.,k. If it is clear from the context, we write p_j = f_x(x_j).Q: What is the interpretation of probability?
Then, a discrete random variable is fully determined by {x_1,.,x_k} and {p_1,.,p_k}, along with the restriction that p_j >=0 for all j and that \sum_j p_j = 1.
if you know the possible outcomes and the associated probabilities, you know everything there is to know about the random variable X.
## Expectation
The definition of the expectation is: E(X) = \sum_j p_j x_j.
Q: What is the interpretation of the expectation?
## Expectation: properties
We will now prove two properties of the expectation:
1. For any discrete random variable X with mean E(X), and any two reals a,b, we have that E(aX+b)=aE(X)+b.
- requires definition of the transformation
2. For any $n$ random variables $X_i, i=1,.,n$ and for any $n$ reals $(a_1,.,a_n)$, we have that E(\sum_i a_i X_i) = \sum_i a_i E(Xi).
- prove this under the assumption that the Xi have the same outcomes
- remark: this second property implies the first
## SELF-STUDY:
variance, covariance, correlation, standardization. I need you to know how it works and how to work with them. Know the rules of computation for Cov and Var
- Var(aX+bY)=a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X,Y)
## CONDITIONAL EXPECTATIONS
The most important concept in this course is that of a conditional expectation. I need you to know exactly what they are.
Demonstrate using Chetty's results on Oakland v San Francisco
Why? All throughout this course, we are interested in E(Y|X), the expectation of Y as a function of X. In Regression analysis, we set:
- E(Y|X) = beta0 + beta1 X
and everything is builtt around that!
## CE'
Consider two random variables, X and Y. Assume that X takes values in {x_1,.,x_k} and that Y takes values in {y_1,.,y_m}.
The conditional expectation of Y given X=x_j is E(Y|X=x_j) = \sum_{l=1}^m y_l P(Y=y_l | X=x_j).
Q: Is this a random variable?
## Random variables
We will work with two discrete **random variables** $X$ and $Y$, with **outcomes**
$\{x_1,\cdots,x_k\}$ and $\{y_1,\cdots,y_m\}$
## Joint distribution
The **joint distribution** is given by $$f_{x,y}(x,y)\equiv P(X=x,Y=y).$$
Think of it as a table,
| y/x | $x_1$ | $x_2$ | $\cdots$ | $x_k$ |
| -------- | ------------------ | ------------------ | -------- | ------------------ |
| $y_1$ | $f_{x,y}(x_1,y_1)$ | $f_{x,y}(x_2,y_1)$ | | $f_{x,y}(x_k,y_1)$ |
| $y_2$ | $f_{x,y}(x_1,y_2)$ | $f_{x,y}(x_2,y_2)$ | | $f_{x,y}(x_k,y_2)$ |
| $\vdots$ | | | | |
| $y_m$ | $f_{x,y}(x_1,y_m)$ | $f_{x,y}(x_2,y_m)$ | | $f_{x,y}(x_k,y_m)$ |
## Marginal distribution
If we do not know anything about $Y$, and are only interested in $X$, then we work with the **marginal distribution**
$$f_x(x) \equiv P(X=x) \equiv \sum_y f_{x,y}(x,y)$$
where the sum is over all the outcomes $y$. The marginal distribution for $Y$, $f_y(y)$, is defined similarly.
## Conditional distribution
If we know that $Y=y_k$, we should take that into account for assigning probabilities to outcomes of $X$.
The **conditional probability distribution** is
$$ f_{x|y}(x|y) \equiv P(X=x|Y=y) \equiv \frac{f_{x,y}(x,y)}{f_y(y)} $$
## Independence
Two random variables $X$ and $Y$ with outcomes $\{x_1,\cdots,x_k\}$
and $\{y_1,\cdots,y_m\}$ are **independent** if and only if
$$ f_{x,y}(x,y) = f_x(x)f_y(y) \text{ for all outcomes }x\text{ and }y.$$
## Independence: conditional version
If the random variables $X$ and $Y$ are independent, then
$$
\begin{aligned}
f_{x|y}(x|y) &= \frac{f_{x,y}(x,y)}{f_y(y)} \\
&= \frac{f_{x}(x)f_y(y)}{f_y(y)} \\
&= f_x(x).
\end{aligned}
$$
Information about $Y$ does not affect the probability with which we see outcomes of $X$.
# Conditional expectations
## Definition
The conditional expectation of $Y$ given $X=x$ is
$$ E(Y|X=x) = \sum_y y f_{y|x}(y|x).$$
This definition is similar to that of the *unconditional* expectation $E(Y)$, but uses the *conditional* probability distribution instead of the *marginal*.
## Definition (2)
Sometimes we will use the object $E(Y|X)$. This object differs from $E(Y|X=x)$.
\bigskip
Importantly, $E(Y|X)$ is a random variable.
## Importance
Conditional expectations are the main object of interest for modern work in applied econometrics.
- There are settings in which $\Delta = E(Y|X=x') - E(Y|X=x)$ can be interpreted as the causal effect of $X$ on $Y$
- It is interesting to think about settings in which this is not the case;
- It is then interesting to formulate other conditional expectations that **do** admit that interpretation
- In BUEC 333, the course revolves around $E(Y|X)=\beta_0 + \beta_1 X$
- Why the linearity?
## CE.1
The first property (CE.1) of conditional expectations says that, for any random variable $X$, and for any function $c:\{x_1,\cdots,x_k\}\to\mathbb{R}$,
$$ E(c(X)|X) = c(X) $$
- What does this mean?
- Is it a random variable?
## CE.2 - Linearity
The second property states that, for any two random variables $X$ and $Y$, and for any functions $a,b:\{x_1,\cdots,x_k\}\to\mathbb{R}$,
$$ E(a(X)Y + b(X)|X) = a(X) E(Y|X) + b(X).$$
Just like the unconditional expectation, the conditional expectation operator is linear.
## CE.3 - Independence
Let $X$ and $Y$ be two independent random variables. Then $E(Y|X)=E(Y)$.
## LIE
The *L*aw of *I*terated *E*xpectations (LIE) says that, for any two random variables $X$ and $Y$,
$$ E(E(Y|X))=E(Y).$$
Proof: [blackboard]
\bigskip
A more advanced version states that, for any three random variables $X,Y,Z$,
$$ E(Y|X)=E(E(Y|X,Z)).$$
## Notions of unrelatedness
1. Remember: $X \perp Y \Rightarrow E(Y|X)=E(Y)$
2. CE.5: $E(Y|X)=E(Y) \Rightarrow Cov(X,Y)$
Independence is stronger than mean-independence is stronger than uncorrelatedness.
## The reverse is not true!
| y/x | -1 | 0 | 1 |
| --- | --- | --- | --- |
| 0 | 0 | 1/3 | 0 |
| 1 | 1/3 | 0 | 1/3 |
Table: $Y=X^2$
- What is $Cov(X,Y)$?
- What is $E(Y|X)$?
## Notes on the notions
1. If $u$ and $X$ are random variables and we assume that $E(u|X)=0$, then:
- $E(u)=0$
- $Cov(X,u)=0$
2. If $X,Y$ follow a Normal distribution (?), all notions are equal