---
title: "Introduction"
output:
beamer_presentation:
toc: true
slide_level: 2
header-includes:
- \usepackage{amsmath}
---
# Readings
## Readings
- Read Chapter 1
- Appendix A:
- A.1 is covered below
- verify that you know the material in A.2-A.5
- Some lecture material is not covered in Appendix A
- proof strategy
- convergence of sequences
# Admin
## Slides
... are empty. I will update them for 2017.
\bigskip
Note to self:
- notes are in onenote
- plus some handwritten notes
## Admin overview
b. Two lectures
i. Tuesday: two hours of lecture
ii. Thursday: one hour of lecture, two hours of R. Also functions as office hours
iii. Later on, Thursday will become three hours of R
iv. In test weeks (7 and 13):
1) Tuesday: office hours
2) Thursday: test
c. Syllabus: this Thursday
4. Course rules:
a. Attendance is required.
b. Active participation is expected. I have not decided on an incentive system yet, but I will not answer your emails, write ref letters, etc, if you are not active.
. Code of conduct. You are expected to be familiar with the Code of Academic Integrity and Good Conduct. This is a course designed for HON students, so I expect there to be no problems. In the unlikely event I would catch somebody copying on assignments, cheating, or lying, I will go for the maximum penalty.
. Attendance required for all lectures, unless I mention otherwise. If you have a valid reason for which you cannot make it, you have to communicate this to me before at latest before the day of the lecture.
. Reference your code. Using fellow students and Google s a great way to learn R. However, copying code without attribution is plagiarism. Make sure to properly reference your code.
. Midterm and final You are allowed to bring one A4/letter size paper, written on both sides, and a non-graphical calculator
. Office hours The Thursday lab is an office hour.
. No emails.
. Regrading. Come and talk to me. Typically, I will ask you to then send me an email with a clear reason for regrading, and I will remark your entire assignment/test (with the possibility of a lower grade)
. Illness. If you are ill and cannot hand in an assignment: contact me. If you miss the midterm, final, or the term paper deadline because of illness, please follow the steps in my BUEC 333 syllabus.
## Term paper etc
While the lectures will provide you with the background on the models and techniques, we
will be working through interesting, recent publications in top economics
journals. We will read the paper, get the data, and estimate the models that the authors
estimate. We may estimate *better* models! This way, we learn about 3, and we also learn
(third objective) about programming in R for data analysis, and (fourth) about research at
the frontier of economics.
One of the results will be that we will be writing R code to replicate the results
in existing papers. This code should be well-commented, and ready for publication online.
I plan to post all the codes on the course website as a study tool for next year's class.
You will all be reading all the four assigned papers. For part of the five hours per week,
you will be teaching each other about what the papers are about, about how *you* coded the
analysis, why *you* think certain models are reasonable, and why *you* think the authors
did a good/bad job. Learning to present technical material in front of an audience is the
final (fifth) objective of this course.
GRADING GUIDELINES
Proposed change:
- two tests, each 25%
- assignments and student presentations, total of 25%
- your own replication / term paper, 25%
Student presentations
The student presentations in weeks 1-6 will be you working out the answers to
mathematical problem sets in front of your colleagues, and walking them through it.
The student presentations in weeks 8-14 will be you explaining the assigned
paper to your colleagues, and explaining to them how you have been replicating it.
It's a workshop, where we all figure out what the paper is about, and whether
we can manage to replicate it.
TERM PAPER
Counts for 25 percent. Pick a replicable paper.
Grading: 10 for intro, data description and conclusion; 10 for model description, and discussion of causality in the framework of your paper; 10 for discussion of results; 10 for R code; 10 for creativity (extension).
# Course content intro
## Overview
1. Personal introductions: who are you, who am I?
2. What is the goal of this course? Econometrics is the use of data, in combination with statistical methods, to estimate economic quantities, and to test economic theories.Microeconometrics is econometrics that uses microdata, which means data that is collected at the individual, firm, household, etc. level.
a. One goal is to prepare those of you for grad school in terms of modern econometrics. Grad school is tough, and this course may be a first taste. Students usually curse at me halfway through the course. On the flipside, they are often quite pleased in hindsight.
b. Relationship to BUEC 333: this is a much more advanced version of BUEC333. It has more everything:
i. Understanding (and proving!) the theory
ii. Understanding how to apply the methods in more advanced settings
iii. More programming:
1) You need to learn how to write functions, and scripts that use R for simulation studies
2) You need to learn how to find, import, and analyze data to replicate papers in top economic journals
iv. Matrix algebra
3. Course organization
a. Two parts:
i. Part 1: Mathematical background
1) How to proof stuff
2) Probability
3) Stats
4) Matrices
5) OLS using matrices
ii. Part 2: Using the math for
1) Methods
a) IV
b) Panel data
c) Program evaluation methods
2) Applications:
a) Institutions and growth using the pattern of colonization
b) Estimating the return to government spending using data on mafia!
c) Using data on twins to learn about the returns to schooling
d)
## Causality and observational data
Usually, at this point, I would give an introduction to what this course is about using some empirical examples.
However, we can use all the time we have in this course given that there is a very large gap between the end of BUEC 333 and where I want to get you.
However, let me give you a tiny example about why it may not be trivial to infer causal effects from observational data.
Observational data is data that was gathered without intervening in the world. It is the opposite of experimental data, when you generate data using.
## Crop yield
See written notes.
## Regression models
To answer questions about causal effects using observational data, we use regression models.
1. A regression
model consists of an equation like
Yi = beta0 + beta1 Xi + ui
and some assumptions on the distributions of the RVs in this model (ui and Xi). For
example, we may say that ui has no relationship with Xi. We may say that everybody
has the same variance of ui. We may say that Xi has no outliers. These are distributional
assumptions.
2. For a given model, I will tell you about procedures that you can use to estimate
quantities of interest (like beta1 in the exapmle above). Learning techniques that
are more advanced than OLS is one of the objectives of this course.
3. These distributional assumptions have real-world interpretations. In a given application,
I need you to know what those are, and I need you to be able to discuss whether they are
reasonable IN THAT APPLICATION. This is a second objective of this course.
# Proofs
## Proofs
In this course, you will see and do mathematical proofs. Now, I will show you what a proof is, and how I expect your proofs to look like. To practice proofs, we will look at two simple examples, using natural and real numbers.
## Natural numbers
A natural number...
## Real numbers
A real number...
## Real numbers (cont'd)
- Add triangle inequality and absolute value, norms, etc (for proof of $an+bn \to a+b$)
- i.e. metric spaces version (see also MIT Courseware): norm, distance,..., preparing for vectors.
## Proof format
In this class, we will be looking at _for all_ proofs:
> For each $x \in U_x:P(x)$
For example
# Sequences
## Intro
First, introduce what a deterministic sequence is. Give a few examples. Sequences can be finite or infinite. In this course, deterministic sequences always live in R.
Working with sequences is fundamental to probability theory and statistics. For example, we can have a sequence of observations.
A basic operation for sequences is summation. The summation operator, applied to a sequence, is defined as .
There are three properties of summation that we will now prove:
1. Sum of a constant .
2. Sum of a multiplied sequence is multiple of the sum
3. Linearity
## Summation
## Convergence
- Additional information [@KhanAc](https://www.khanacademy.org/math/calculus-home/series-calc/seq-conv-diverg-calc/v/proving-a-sequence-converges), especially video 5.
- picture
## Convergence (2)
Let us use the three properties that will be useful later when we arrive at the statistics part of this course. For a sequence {x_i,i=1,???,n}??? define the average x ??=1/n???_i???x_i ???.
There is lots of other math stuff in Appendix A. I would like you to read through it carefully, and verify that you know everything that is in there. It is background knowledge for this course, and I will rely on it.
Now, briefly, for a more advanced topic: convergence of sequences. So far, we have looked at one property of a sequence: its sum. Another property, for infinite sequence, is the limit. The limit is the element that a sequence will tend to. For example, we think that the sequence {1/i, i=1,.,n} will tend to 0. We call 0 the limit of such a sequence.
Formally, we say that
Definition. A sequence {xi,i=1,.,} converges to its limit c if, for each epsilon>0 there exists a N such that, for each i>N, |xi-c|<\epsilon.
To prove: The sequence {1/i, i=1,.} converges to 0.