Notes of Econ 424 Part 1 Excel Fall

  • Slides: 83
Download presentation
Notes of Econ 424 Part 1: Excel Fall 2008 Ginger Z. Jin http: //www.

Notes of Econ 424 Part 1: Excel Fall 2008 Ginger Z. Jin http: //www. glue. umd. edu/~ginger/ 1

Class 1: Introduction • • • Goal of this class Syllabus First-class questionnaire Logins

Class 1: Introduction • • • Goal of this class Syllabus First-class questionnaire Logins A peek at the data collected by the questionnaire 2

"There are three kinds of lies – lies, damned lies and statistics. " ---

"There are three kinds of lies – lies, damned lies and statistics. " --- (? ) Mark Twain Winston Churchill Benjamin Disraeli 3

Statistical manipulation – Randomness behind each statistics • News tend to report outliers or

Statistical manipulation – Randomness behind each statistics • News tend to report outliers or new observation that deviates from the status quo • Newsworthy is not equal to good statistics – Correlation and causation • Harvard graduates earn more than high school dropouts, this does not mean these dropouts, if given a Harvard diploma, will earn as much – Report favorable information while hide measurement error / variable definition 4

Examples of statistical manipulation • Executives at XYZ Corporation make an average annual salary

Examples of statistical manipulation • Executives at XYZ Corporation make an average annual salary of $250, 000. – one earns $2 mill, the other 11 earn $90, 000. • 88% of those surveyed prefer QRS brand potato chips. – only surveyed 9 people, 8 said yes. • ABC ink jet printers use 22% less ink. – compared to what? • One drug treatment program has a 90% success rate. – Drug free at the end of the program, or x months after finishing the program? • Graph manipulation: – same data but different scale Source: http: //www. effectivemeetings. com/productivity/communication/statmanipulation. asp 5

Real examples • Mc. Cain Radio Ad in Florida – “Colombia Trade”: Three-quarters of

Real examples • Mc. Cain Radio Ad in Florida – “Colombia Trade”: Three-quarters of Florida’s exports are with Latin America. – According to trade statistics generated from the U. S. Dept. of Commerce, Florida’s international exports totaled nearly $45 billion in 2007, while exports to Latin America and the Caribbean totaled nearly $24 billion, comprising only 53% of exports. The three-quarters figure comes from the share of exports that originates in Florida or passes through Florida on its way out of the country. Source: http: //www. factcheck. org/elections 2008/errors_en_espantildeol. html • Democratic National Committee TV Ad: 1. 8 million jobs lost (as compared to 2001). – In truth, total nonfarm employment was up by nearly 5. 4 million compared to when Bush took office. The 1. 8 million figure comes from the number of people without jobs which was higher by 1. 8 million than when Bush took office. However, this drop only means that job growth did not keep pace with the number of people seeking work. Source: http: //www. factcheck. org/elections-2008/dnc_vs_mccain. html 6

My goal At the end of the semester: • You feel comfortable collecting, locating

My goal At the end of the semester: • You feel comfortable collecting, locating and analyzing real data • You are able to read, interpret and criticize statistics generated by other people • Given a real data set, you are able to generate basic statistics by yourself and give them meaningful interpretations 7

Syllabus is at: http: //www. glue. umd. edu/~ginger/ Click on “ECON 424” at the

Syllabus is at: http: //www. glue. umd. edu/~ginger/ Click on “ECON 424” at the bottom Assign project 1 8

Introduction to Excel Handout from UMD peer training is available at http: //www. glue.

Introduction to Excel Handout from UMD peer training is available at http: //www. glue. umd. edu/~ginger/teaching/econ 424 spring 2008/exceltutor. html. Main contents: • Open a file in N: and save it (as. xls or. csv) in M: . • Define observation and variable • Hide/unhide rows/columns, freeze panes • Change excel settings (Tools+options) • Change cell/column/row formats • Highlight (shift+, ctrl+) • Formulas/dragdowns • Charts 9

Extra excel to be covered in class • Text to columns • Insert text

Extra excel to be covered in class • Text to columns • Insert text box in excel • Transport excel table and chart into MS words • Generate random numbers 10

Class 2: Data Collection (e. stat chapter 3) • Start with a question –

Class 2: Data Collection (e. stat chapter 3) • Start with a question – characterize music downloads among 18 -25 • Define observation – an individual, a class, a school? • Define variables – use or not, intensity of usage, scope of usage • Define sample – Students enrolled? Students attending the first class and love music? 11

Data Collection (continued) • Methods of data collection – Field collection by hand –

Data Collection (continued) • Methods of data collection – Field collection by hand – Experiment – Survey / Questionnaire • • In class By phone By email Follow up survey – Existing data sources (library, internet, etc. ) 12

Data Collection (continued) • Things that need special attention – Measurement error • data

Data Collection (continued) • Things that need special attention – Measurement error • data collector’s preference, if revealed in the survey, may bias respondent answer • self report may be biased in a specific direction • anonymous vs. identity-revealing – Sample selection • sample not representative by design (those attending first class are different from those who don’t attend) • missing values generate sample selection (need a follow up? ) – Sample size and balance • variations in the studied variables • trade off between statistical power and cost of data collection • similar size of comparable subsamples 13

Role playing • To better understand peer-to-peer file sharing in the music industry, what

Role playing • To better understand peer-to-peer file sharing in the music industry, what statistics would you generate and analyze if you are the head of: – – – Sony/BMG A Rock Star A new, unknown artist Potential Consumers Apple/Steve Jobs Kazaa (or another file-sharing program) 14

Role playing tasks • Specify a question • Define a data set that will

Role playing tasks • Specify a question • Define a data set that will help you answer the question. What statistics do you want most from this data set? • How would you interpret the statistics if it turns out to be …. (imagine the possible outcomes)? • What cannot the statistics say about? • Identify another statistics that would help you most but you cannot get internally 15

Class 3: Data Description Before summarizing the data, clean it first! • Sample –

Class 3: Data Description Before summarizing the data, clean it first! • Sample – every student that attended 1 st class (may be different from the official roster) • Unit of observation? • # of observations? • Variable(s)? • Missing values? • Abnormal values? – Delete them, clean them? Be aware of the assumptions you are making. • What’s the # of observations after all the cleaning? • Take a note of all the above! 16

Class 3: Data Description (e. stat chapter 4) • Mean unweighted: Excel: =average(data) weighted:

Class 3: Data Description (e. stat chapter 4) • Mean unweighted: Excel: =average(data) weighted: Example: compute GPA (e. stat Figure 4. 7) 17

Class 3: Data Description • Median Define: middle point in the data set 50%

Class 3: Data Description • Median Define: middle point in the data set 50% observations >= median 50% observations <= median Excel: median(data) If the distribution is symmetric, median=mean. Unlike mean, median is insensitive to outliers. Example: e. stat Figure 4. 8 18

Class 3: Data Description • Trimmed mean: Ignores a percentage of values that are

Class 3: Data Description • Trimmed mean: Ignores a percentage of values that are extreme and compute mean for the rest. Excel: trimmean(data, percent) Example: e. stat Figure 4. 10 19

Class 3: Data Description • Order statistics 1 st quartile (25% obs below) =quartile(data,

Class 3: Data Description • Order statistics 1 st quartile (25% obs below) =quartile(data, 1) 2 nd quartile = median =quartile(data, 2) 3 rd quartile (75% obs below) =quartile(data, 3) 4 th quartile = maximum =quartile(data, 4)=max(data) 60 th percentile (60% obs below) =percentile(data, 0. 6) 0 percentile = minimum =min(data) range = max - min Interquartile range = 3 rd quartile – 1 st quartile Interquartile ratio = 3 rd quartile / 1 st quartile Example: e. stat Figure 4. 15 20

Class 3: Data Description • Sample variance and standard deviation Var(x): =var(data) Std dev:

Class 3: Data Description • Sample variance and standard deviation Var(x): =var(data) Std dev: =stdev(data) Note: sample variance not equal to population variance Example: e. stat Figure 4. 15 21

Class 3: Data Description • Other jargons Mode: the most common value =mode(data) Skewness:

Class 3: Data Description • Other jargons Mode: the most common value =mode(data) Skewness: asymmetry, long right tail = positively skewed =skew(data) Kurtosis: peakedness, positive if peakier than normal distribution =kurt(data) Example: e. stat sections 4 -13, 4 -18, 4 -20. 22

Class 3: Data Description Summary: N: sharenotesdata-summary-formula. doc Exercise: N: sharenotesdata-summary-exercise. xls Variables already

Class 3: Data Description Summary: N: sharenotesdata-summary-formula. doc Exercise: N: sharenotesdata-summary-exercise. xls Variables already summarized: age, gender, music_source, ipod_own, num_cds Your exercise: num_song_purchased, last_acq, where_acq Optional: num_free_download, num_songs 23

Class 4: Histogram (e. stat Section 4 -3) • Histogram: A column chart in

Class 4: Histogram (e. stat Section 4 -3) • Histogram: A column chart in which line segments are graphed for the frequencies of classes across the class intervals (bins) and then each segment is connected to the X-axis to form a rectangle. 24

Class 4: Histogram • Steps in drawing histogram: – Define bins • Must cover

Class 4: Histogram • Steps in drawing histogram: – Define bins • Must cover all the data (start from min or less, stop at max or more) • Equal width • The number of bins is between 5 and 20 – Count frequency in each bin • Method 1: =countif(. . ) • Method 2: =frequency(data, bins) • Method 3: Tools – data analysis – histogram – Plot histogram • Note: histogram is a frequency chart that shows the distribution of the raw data. It is not equal to highlighting the raw data and plotting them directly. 25

Relative frequency polygon • • (Absolute) frequency: number of observations per bin draw histogram

Relative frequency polygon • • (Absolute) frequency: number of observations per bin draw histogram as a column bar chart Relative frequency: percentage of observations per bin draw relative frequency polygon as a line chart Relative frequency polygon is more convenient to compare two data sets, especially when they differ in the number of observations. N: sharenotesdata-summary-example-fall 2007. xls 26

Prepare for Project 1 • Require original data collected by yourself, no simple download

Prepare for Project 1 • Require original data collected by yourself, no simple download from a published source • Submit two files: – one excel (for data details) – one word document (for description and summary) • Common mistakes last semester – Compare two relative frequencies using different bin definitions – Use histograms/relative frequency polygon to infer correlation – Treating categorical variable as continuous – Spelling mistakes, forgot to label data sheets, data description lack of details 27

Class 5 -6: Probability Theory (e. stat Chapters 5 and 6) • Population: entire

Class 5 -6: Probability Theory (e. stat Chapters 5 and 6) • Population: entire set of events that occur in a given universe. – – – Event probability Probability density function (PDF, ) Cumulative density function (CDF, ) Population statistics (mean, variance, etc. ) Certainty about a random process • Sample: a subset of a population – Data analysis – Random by nature – Sample statistics are random variables, but population statistics are not! 28

Population mean • For discrete • For continuous : : 29

Population mean • For discrete • For continuous : : 29

Population variance and standard deviation • For discrete • For continuous : : 30

Population variance and standard deviation • For discrete • For continuous : : 30

Compute by hand or excel: • Bernoulli distribution (flip a coin) With probability •

Compute by hand or excel: • Bernoulli distribution (flip a coin) With probability • Flip n coins and define • Roll a die ? Each value with probability • Roll two dice? • Roll n dice? 31

How to simulate in excel? Bernoulli Distribution With probability Hint: =rand() provides a random

How to simulate in excel? Bernoulli Distribution With probability Hint: =rand() provides a random number between 0 and 1. You can use rand(), but your formula must return integer 0 or 1. Answer: =if(rand()<p, 1, 0) What about flip n coins? 32

How to simulate in excel? Roll a die Each value with probability Hint: Your

How to simulate in excel? Roll a die Each value with probability Hint: Your formula must return an integer between 1 and 6, with equal probability. Answer: =round(rand()*6+0. 5, 0), or =randbetween(1, 6) Note: randbetween function may not exist in some versions of Excel. What about roll n dice? 33

Compute by hand or excel: • uniform distribution between (a, b) Answer: 1/(b-a) pdf

Compute by hand or excel: • uniform distribution between (a, b) Answer: 1/(b-a) pdf a b x 34

Normal distribution • Normal distribution: Example: 35

Normal distribution • Normal distribution: Example: 35

What about ? Note: if both x 1 and x 2 are normally distributed,

What about ? Note: if both x 1 and x 2 are normally distributed, so is x 1+x 2. But if x 1 and x 2 are uniformly distributed, x 1+x 2 is not uniformly distributed. 36

Normalize • If uniform on (a, b) then uniform on (0, 1) • If

Normalize • If uniform on (a, b) then uniform on (0, 1) • If then 37

How to simulate in excel? • Uniform on (a, b): • Hint: rand() gives

How to simulate in excel? • Uniform on (a, b): • Hint: rand() gives you uniform on (0, 1). You need to adjust it to fit in the range of (a, b). • Answer: =a+rand()*(b-a) • Normal: • Answer: =norminv(rand(), miu, sigma) 38

Class 7: Central limit theorem (e. stat section 10 -05) Given any population with

Class 7: Central limit theorem (e. stat section 10 -05) Given any population with mean and standard deviation , for a large sample (N>30), we have: Simulation: show-central-limit-theorem. xls Assign project 2 39

Show CLT in Excel (1) • Choose a population – type of distribution, e.

Show CLT in Excel (1) • Choose a population – type of distribution, e. g. Bernoulli – distribution parameters, e. g. p=0. 3. • Simulate data – 200 samples – each sample of size N • Lock in the simulated data so the samples do not change later on – copy, paste special 40

Show CLT in Excel (2) • Calculate sample mean for each sample • Compare

Show CLT in Excel (2) • Calculate sample mean for each sample • Compare the distributions of (1) all the raw data and (2) all the sample means – bin range must be wide enough to cover the most dispersed distribution – bin width must be narrow enough so that there at least 4 -5 bins for the most concentrated distribution 41

Class 8: Introduction to Crystal Ball 42

Class 8: Introduction to Crystal Ball 42

Summary of project 2 • Most students perform well • Correct answer: 43

Summary of project 2 • Most students perform well • Correct answer: 43

Common mistakes in project 2 • You should simulate from normal/uniform distribution, not from

Common mistakes in project 2 • You should simulate from normal/uniform distribution, not from Bernoulli. • You should put raw data and sample mean in ONE relative frequency polygon. To do so, you must have the same bin definition for all the data series plotted in the graph. 44

Common mistakes in project 2 • Raw data relative frequency > 1. Correct: Relative

Common mistakes in project 2 • Raw data relative frequency > 1. Correct: Relative freq=abs freq/(100 XN) instead of abs freq/100. Wrong! 45

Common mistakes in project 2 • For normal distribution, bins start from miu-sigma. Correct:

Common mistakes in project 2 • For normal distribution, bins start from miu-sigma. Correct: bin should start from miu-3*sigma. Wrong 46

Class 9: Mean estimation • According to CLT, sample mean is an unbiased estimate

Class 9: Mean estimation • According to CLT, sample mean is an unbiased estimate of population mean, but with some errors. • What is the distribution of if ? Answer: 47

95% chance within this range 48

95% chance within this range 48

But … • Usually we try to guess what is • We don’t know

But … • Usually we try to guess what is • We don’t know either • Use the sample to take a guess on 49

Confidence Interval • Given confidence level (for example) • Where s=est. std. dev. =stdev(data)

Confidence Interval • Given confidence level (for example) • Where s=est. std. dev. =stdev(data) t= t value = tinv(1 -alpha, N-1) • Or xbar +/- confidence(1 -alpha, stddev, size) • Note: In Excel, confidence function assumes we know population standard deviation and therefore does not use t-value • Exercise: e. stat problems 12. 1 and 12. 2 50

Class 10: Hypothesis testing • Null hypothesis H 0: • Alternative hypothesis: H 1:

Class 10: Hypothesis testing • Null hypothesis H 0: • Alternative hypothesis: H 1: two-tail test one-tail test 51

Logic • • Assume H 0 is right Choose a confidence level Compute prob(get

Logic • • Assume H 0 is right Choose a confidence level Compute prob(get the sample mean) Reject H 0 if prob(. . ) is too small, otherwise accept H 0 52

Types of Error • Type I Error: reject correct H 0 (false neg) •

Types of Error • Type I Error: reject correct H 0 (false neg) • Type 2 Error: accept wrong H 0 (false pos) 53

In practice • Method 1 (two tail test only) – Compute and confidence interval

In practice • Method 1 (two tail test only) – Compute and confidence interval – Accept H 0 if falls in the confidence interval – Reject H 0 if falls out of the confidence interval Reject ((1 -alpha)/2) Accept (prob=alpha) Reject ((1 -alpha)/2) 54

In practice • Method 2: – Compute t-statistics Degree of freedom – Compute critical

In practice • Method 2: – Compute t-statistics Degree of freedom – Compute critical value =(+/-)tinv(1 -alpha, N-1) (if two-tail) =(+/-)tinv((1 -alpha)*2, N-1) (if one-tail) – Accept (reject) H 0 if t falls in (out of) the critical value 55

Two-tail critical values Reject ((1 -alpha)/2) Accept (prob=alpha) t Reject ((1 -alpha)/2) 56

Two-tail critical values Reject ((1 -alpha)/2) Accept (prob=alpha) t Reject ((1 -alpha)/2) 56

H 1: One-tail critical value Accept (prob=alpha) Reject (1 -alpha) t 57

H 1: One-tail critical value Accept (prob=alpha) Reject (1 -alpha) t 57

In practice • Method 3: – Compute t-statistics – Compute p-value = prob(t-stat>=|t|) =tdist(|t|,

In practice • Method 3: – Compute t-statistics – Compute p-value = prob(t-stat>=|t|) =tdist(|t|, N-1, 2) (if two-tail) =tdist(|t|, N-1, 1) (if one-tail) – Reject H 0 if p-value<(1 -alpha) Accept H 0 if p-value>(1 -alpha) 58

Exercise • • What is H 0? What is H 1? Two-tail or one-tail?

Exercise • • What is H 0? What is H 1? Two-tail or one-tail? Choose a method E. stat problems: – two-tail test: e. stat 13. 13 and 13. 17 – One-tail test: e. stat 13. 20 and 13. E 10 59

Two-tail vs. one-tail • Two-tail test does not indicate which direction to go if

Two-tail vs. one-tail • Two-tail test does not indicate which direction to go if we reject H 0: , so the alternative is H 1: . • One-tail test has a strong view of one direction. For example, a saleman wants to know whether sales have increased from the past, in which case the alternative is H 1: . If he is worried if the sales have decreased, the alternative will point to another direction where H 1 is. 60

In Excel (alpha=95%): Two-tail • H 0: One-tail • H 0: • H 1:

In Excel (alpha=95%): Two-tail • H 0: One-tail • H 0: • H 1: • T-stat: • Crit. Val: (+/-) tinv(0. 05, N-1) • Crit. Val: = + tinv(0. 10, N-1) = -tinv(0. 10, N-1) • Reject if t > + crit. val. or t< - crit. val • Reject if t> crit. val. • Reject if t<crit. val. 61

Three methods, same result Two-tail • H 0: • H 1: • Conf. Interval

Three methods, same result Two-tail • H 0: • H 1: • Conf. Interval [ , • T-stat: • Crit. Val: (+/-) tinv(0. 05, N-1) • P-value: tdist(|t-stat|, N-1, 2) • Reject if t > + crit. val. or t< - crit. val • Reject if p<0. 05 ] tinv(0. 05, N-1) • Reject if is out of conf. interval 62

Class 11: two sample testing • One sample test H 0: • What if

Class 11: two sample testing • One sample test H 0: • What if we don’t know but have two samples • Can we compare and ? • Yes, must account for errors in both 63

Two independent samples • H 0: • H 1: (two-tail) (one-tail) • Independent errors

Two independent samples • H 0: • H 1: (two-tail) (one-tail) • Independent errors in the two samples are independent • Degree of freedom [min(N 1, N 2)-1] • Exercise: e. stat problems 14. 5, 14. 6 64

Two matched samples (same subjects, N 1=N 2) • H 0: • H 1:

Two matched samples (same subjects, N 1=N 2) • H 0: • H 1: • • • (two-tail) (one-tail) Generate a new variable dx=x 1 -x 2 Transform H 0: Now is a one-sample test Degree of freedom N 1 -1 Example: e. stat figure 14. 8 Exercise: e. stat problem 14. 7 65

How to tell if two samples are matched or not? • N 1=N 2

How to tell if two samples are matched or not? • N 1=N 2 for matched pairs • Same subjects? • If resorting one sample does not affect the comparison, they are independent 66

Class 12: regression • “Regress Y on X” means: Error term Independent variable(s) Dependent

Class 12: regression • “Regress Y on X” means: Error term Independent variable(s) Dependent variable coefficients 67

Ordinary least squares (OLS) Goal: Find a linear line that best fits the data

Ordinary least squares (OLS) Goal: Find a linear line that best fits the data Best: Solution: Slope (random!) Intercept (random!) Average point line is always on the 68

Test coefficients • Estimated coefficients are random numbers! – depend on data • The

Test coefficients • Estimated coefficients are random numbers! – depend on data • The point estimates should be judged together with their standard errors • Hypothesis test H 0: • T-statistics • Critical values (assuming conf. level=95%) =tinv(0. 05, N-2) for two-tail Two-tail or one-tail depends on H 1 =tinv(0. 10, N-2) for one-tail • P-values =tdist(|t|, N-2, 2 or 1) 69

Measure the fit of OLS • Total sum of squared deviations (TSS): • Decompose

Measure the fit of OLS • Total sum of squared deviations (TSS): • Decompose TSS: – Explained by the model: – Unexplained residuals: • R-square: (explained)/(total) 70

F-test k=number of coefficients N=number of observations H 0: all the coefficients except the

F-test k=number of coefficients N=number of observations H 0: all the coefficients except the constant term are zero. H 1: some of the non-constant coefficients are not zero. 71

 • Correlation coefficient: Note: 1. Correlation coefficient is between -1 and 1. What

• Correlation coefficient: Note: 1. Correlation coefficient is between -1 and 1. What does it mean if correlation coefficient is equal to -1, 0, or 1? 2. correlation coefficient is symmetric, i. e corr(x, y)=corr(y, x), but OLS coefficients aren’t. This means regress y on x is not equivalent to regress x on y. 3. Correlation coefficient is always of the same sign as the OLS slope coefficient. 4. R-square = (correlation coefficient)^2 72

Regression in Excel • Method 1: =linest(data of y, data of x, include const?

Regression in Excel • Method 1: =linest(data of y, data of x, include const? , other statistics? ) No output labels, must be familiar with the output layout • Method 2: Tools - data analysis – regression • Note: you could have multiple x, but they must be adjacent to each other. • Example: e. stat Figure 19. 7 • Exercise: e. stat problems 19. E 1, 19. E 2, 19. E 3 73

Assumptions and Caveats in OLS (e. stat section 22 -04) • • Errors have

Assumptions and Caveats in OLS (e. stat section 22 -04) • • Errors have mean zero. Errors have a constant standard deviation. Errors are drawn independently. Errors are uncorrelated with x. All the important x are included in the regression. (omitted variable bias, see e. stat Figure 22. 4. ) • Errors are distributed normally. 74

Class 14: Midterm Review – open book – understand the concepts – use them

Class 14: Midterm Review – open book – understand the concepts – use them in real examples – 9: 30 -10: 45 am, Plant Sciences 1129 – If you cannot attend the midterm for reasons that are consistent with University Policy, please let me know AT LEAST 12 hours BEFORE the midterm time, otherwise your midterm grade will be counted as zero. 75

Concepts to grasp (1) • Population / sample • Population – Cdf (prob(var<x)) –

Concepts to grasp (1) • Population / sample • Population – Cdf (prob(var<x)) – Pdf (first derivative of cdf) – population mean, population std. dev. • Sample – Histogram, frequency polygon, quartiles, percentiles, sample mean, sample std. dev. , skewness, kurtosis • Population sample – Central limit theorem xbar~N(µ, σ/sqrt(n)) • Sample Population – Xbar is a proxy of µ with noise 76

Concepts to grasp (2) • Inference – Type I error, Type II error –

Concepts to grasp (2) • Inference – Type I error, Type II error – Confidence level α – Confidence interval – Hypothesis testing • • H 0 H 1 Accept/reject? One-tail, two-tail test 77

Summary of Excel (1) • Basic excel – open, save and close files –

Summary of Excel (1) • Basic excel – open, save and close files – cut, paste and paste special – change format for cell, row or columns – sort data by one or two variables – chart wizard – freeze panes – drag cells – use excel functions 78

Summary of Excel (2) • Data description – mean, median, trimmed mean – standard

Summary of Excel (2) • Data description – mean, median, trimmed mean – standard deviation, variance – quartiles – mode, skewness, kurtosis – histogram (absolute frequency) – relative frequency polygon 79

Summary of Excel (3) • Probability theory – PDF, CDF – mean and standard

Summary of Excel (3) • Probability theory – PDF, CDF – mean and standard deviation – bernoulli, binomial – uniform, normal – how to simulate them in Excel? – Central limit theorem – how to see central limit theorem in excel? 80

Summary of Excel (4) • Estimation and Hypothesis testing – – – – –

Summary of Excel (4) • Estimation and Hypothesis testing – – – – – use sample mean to estimate population mean confidence interval type I error and type II error null hypothesis (H 0) and alternative hypothesis (H 1) one-tail vs. two-tail t-statistics, critical value, p-value one-sample test two-sample test (independent) two-sample test (matched pair) 81

Summary of Excel (5) • Linear regression – model • one variable on the

Summary of Excel (5) • Linear regression – model • one variable on the right hand side • more than one variables on the right hand side • create and use binary variables – fit of the model • • R square F test scatter plot correlation coefficient – coefficient estimates • point estimate • hypothesis testing • omitted variable bias 82

Class 15: Midterm Grades 83

Class 15: Midterm Grades 83