Welcome to QM 222 D 1 Project Section

  • Slides: 49
Download presentation
Welcome to QM 222 D 1 (Project Section) Making Decisions with Data Class 1

Welcome to QM 222 D 1 (Project Section) Making Decisions with Data Class 1 Professor Shulamit (Shu) Kahn skahn@bu. edu TAs: Danika Guiley dnkgly@bu. edu Justin Padilla jpadilla@bu. edu QM 222 Fall 2016 Section D 1 1

Today’s Agenda • Introductions • Go over syllabus together. • What this course, your

Today’s Agenda • Introductions • Go over syllabus together. • What this course, your project, (and most data analysis) is about. • Datasets. • Jeopardy QM 222 Fall 2016 Section D 1 2

Self-introduction card (Front) 1) Your full name. 2) What I should call you in

Self-introduction card (Front) 1) Your full name. 2) What I should call you in class. 3) An interesting fact about yourself. (Back) 4) Any requests, concerns, or things you want to share with me. QM 222 Fall 2016 Section D 1 3

My self-introduction 1. Full name: Shulamit Kahn 2. What you should call me in

My self-introduction 1. Full name: Shulamit Kahn 2. What you should call me in class: Professor Kahn, Shu, Professor, Prof. 3. Interesting facts: I am on the Board of Trustees of a school for street children in Kenya. 4. Any requests, concerns, or things I want to share with you: • Throughout the semester, please come talk to me -- about your ideas, about anything you don’t understand. QM 222 Fall 2016 Section D 1 4

Where is Kenya? QM 222 Fall 2016 Section D 1 5

Where is Kenya? QM 222 Fall 2016 Section D 1 5

More about me • Ph. D in Economics from MIT • Recently, I’ve been

More about me • Ph. D in Economics from MIT • Recently, I’ve been doing quantitative research using methods we’ll be learning, on: Entrepreneurship and immigration, International diffusion of science, Science careers, Engineering careers, Women in science. QM 222 Fall 2016 Section D 1 6

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and most data analysis) is about. • Datasets. • If time: Jeopardy QM 222 Fall 2016 Section D 1 7

Syllabus – Reaching Me and TAs • Me: Professor Shulamit Kahn • Email: skahn@bu.

Syllabus – Reaching Me and TAs • Me: Professor Shulamit Kahn • Email: skahn@bu. edu • Office 518 C • Office Hours: Mon 2: 15 -3: 30, Wed 11: 15 -12: 15 (or by appointment – just email with “D 1 appointment” in the subject) • Mail file for all assignments: 520 F • Teaching Assistants • Danika Guiley: dnkgly@bu. edu • Office hours: T 7: 00 -8: 00 pm (location TBA) Sun 1: 00 -3: 00 (undergrad lounge) • Justin Padilla: jpadilla@bu. edu • Office hours: W 5: 00 -6: 00 Th 5: 00 -6: 00 QM 222 Fall 2016 Section D 1 8

Two Websites • The main website for this course is http: //sites. bu. edu/qm

Two Websites • The main website for this course is http: //sites. bu. edu/qm 222 projectcourse. • Here you will find this syllabus and schedule, lecture slides, information on datasets, assignments, instructions on using Stata, practice tests etc. • The website for handing in online assignments is our section’s Questrom. Tools (QM 222 D 1 Fall 2016) website. Grades will also be posted on this website. QM 222 Fall 2016 Section D 1 9

Handing in assignments 5 th Floor Room 520 F Charles River QM 222 Fall

Handing in assignments 5 th Floor Room 520 F Charles River QM 222 Fall 2016 Section D 1 Commonwealth Ave. Atrium 10

In 520 F, put the problem sets & projects in the vertical file with

In 520 F, put the problem sets & projects in the vertical file with your classtime and my name (Kahn) QM 222 Fall 2016 Section D 1 11

How this section is different from other QM 222 s • The best way

How this section is different from other QM 222 s • The best way to learn data analysis is to do it. • The best way to do it is to be interested in what you are researching. • This section revolves around one major project of your choice. • In this section, you prepare your own data set. • Rather than problem sets, it has assignments that are stages of this project • It has one midterm but no final, and tests are worth less. • It has a smaller class-size and 2 TA’s so we can supervise you. (We’d do them all this way if we could supervise 300 projects. ) • I don’t think about a grade distribution here – I’d be happy with all A’s and A-’s. • It is easier to do well on the project than tests because I give you feedback and chances to improve. QM 222 Fall 2016 Section D 1 12

Also, we use different – Statistical software – Stata rather than Excel • We

Also, we use different – Statistical software – Stata rather than Excel • We use different software that costs $75 or requires working in the lab. • It is very simple and quick to do statistical analysis in Stata, compared to Excel. • Excel does only limited kinds of statistical analysis and limited numbers of variables. • Excel cannot deal with large data sets. • Statistical programs allow you to retain a “log” of everything that you have done so you can go back and look at it. • Statistical programs allow you to put commands you want into a file and run them at the same time. QM 222 Fall 2016 Section D 1 13

Getting Stata • Stata is available in the lab but… • I encourage you

Getting Stata • Stata is available in the lab but… • I encourage you to buy your own copy of Stata. • Buy Stata/IC. • DO NOT BUY SMALL STATA – IT WILL NOT FIT MOST DATA SETS you will want to use. • It costs $75 for one semester, or for $125 for a year. • Here’s the website to go to buy Stata: http: //www. stata. com/order/new/edu/gradplans/studentpricing/ • The alternative is to use Stata in the some computers in the open access computer lab room 328. QM 222 Fall 2016 Section D 1 14

Other things you’ll need • Required reading: QM 222: Making Decisions with Data (course

Other things you’ll need • Required reading: QM 222: Making Decisions with Data (course notes). Fed. Ex at 115 Cummington St. • Only use Notes for Project Section D 1 (BLUE) • We will also use Excel (to prepare you for QM 323 and because graphs are easier in Excel). QM 222 Fall 2016 Section D 1 15

You will need the most updated version of Excel. You can download this current

You will need the most updated version of Excel. You can download this current version for free from this website: http: //www. bu. edu/tech/serv ices/support/desktop/distribu tion/microsoft/studentoffice/ Then you will need to install Data Analysis The TA will help with this on Friday. QM 222 Fall 2016 Section D 1 16

Course Components and Grading You’ll get the higher of these two grades: Course Component

Course Components and Grading You’ll get the higher of these two grades: Course Component Weight in Final Grade V 1 Project 44% Weight in Final Grade V 2 68% Test 43% 20% (Timely) assignment and apptmt completion 6% 6% Attendance, Class Participation, Presentation 5% 4% Ungraded Research Obligation (URO) 2% 2% QM 222 Fall 2016 Section D 1 17

More on course components: Test • One test M October 31 or T November

More on course components: Test • One test M October 31 or T November 1, 6 pm • Test is closed book except for 1 -2 sheets of paper (to be clarified) • Let me know in Sept. if you get extra time for tests or if you cannot make this time. • Bring a plain calculator, NOT a cell-phone or laptop, to the test. • There is no final exam. However, those unhappy with their test score can choose to take the regular QM 222 final to replace it. • You will have to inform Prof Kahn by December 7 th if you plan to do this. QM 222 Fall 2016 Section D 1 18

Final Project logistics • The final project is due December 15 th 6 pm.

Final Project logistics • The final project is due December 15 th 6 pm. • Assignments before that lead up to the final project. • A draft is due November 22. • I will read ONE draft of your final project and give you comments (and preliminary score, achievable score) • If you hand it in by Nov. 21, I’ll give feedback by Dec. 5. • If hand in by Dec. 2, I’ll give feedback in the order I receive them but certainly by Dec. 12 th. • The earlier you get me that draft, the earlier you’ll get my comments and the more complete they will be. • Grades are MUCH higher when you can revise your paper. QM 222 Fall 2016 Section D 1 19

Timely assignment and appointment completion of Assignments • There are 7 assignments including the

Timely assignment and appointment completion of Assignments • There are 7 assignments including the final draft, together worth 6 pts of your grade. • All assignments are available on the website. • Assignments are graded only for completion and timeliness of all parts. • Each assignment has different numbers of points • Assignment 1 is due Sept 19 (6 pm) and is worth ½ point. • If you hand it in late, but by Sept 28 6 pm, you get half of that. • See schedule, syllabus for other due dates (and last day for half credit). • All assignments ask for a hard copy and on-line component… Why? • I respond to the content of the hard copy. • The TAs grade the on-line part. • For full credit, hand in both copies with all subsections answered, and all requested data sets and files posted. • There are 2 required meetings with me, first Sept 12 th– Sept 23 th to discuss your topic and dataset. QM 222 Fall 2016 Section D 1 20

Classroom participation Class participation reflects: • Attendance. You are allowed 2 absences. (max 2.

Classroom participation Class participation reflects: • Attendance. You are allowed 2 absences. (max 2. 5 pts) • Your presentation (max 1 pt). • Contributions to class discussions – questions, answers, thoughtful comments (max 1 pt. ) • This includes respectful and professional behavior in class. • Your two appointments with me. • Pick the seat you want and sit there in next class. • Please bring a tent card with your name. QM 222 Fall 2016 Section D 1 21

Professional Behavior • Respect your fellow students and professor by not: • Arriving late,

Professional Behavior • Respect your fellow students and professor by not: • Arriving late, leaving early, leaving for a bit in the middle of class. • Talking when others are talking. • Texting in class – turn off your phones: If I see you using your cell-phone I will mark you as absent for that class. • Doing other things not related to class (Browsing web, reading newspapers, etc). Same as above. • I ask that you close your laptop unless we are using it in class. QM 222 Fall 2016 Section D 1 22

Friday sections • This Friday is required (Excel setup, Excel review, Stata setup) MCS

Friday sections • This Friday is required (Excel setup, Excel review, Stata setup) MCS B 33 • Two other Fridays are also required attendance that I will teach to replace cancelled classes: • Friday Sept. 23 (replacing Monday Oct. 3) and • Friday Oct. 14 (replacing Wednesday Oct. 12) • See schedule (hard copy and on line) • Other Friday sections will be TA office hours. QM 222 Fall 2016 Section D 1 23

URO (ungraded research obligation) is worth 2 points • URO: Everyone in QM 222

URO (ungraded research obligation) is worth 2 points • URO: Everyone in QM 222 must participate in two Friday experiments (45 minutes, 10 am to 6 pm) • One experiment Sept 9 th – Oct 7 th • One experiment Oct. 14 th – Nov. 11 th. • You can replace it by asking for extra assignments, but I need to know before the midterm if you plan to do that. • Sign up today! QM 222 Fall 2016 Section D 1 24

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and most data analysis) is about. • Datasets. • Jeopardy QM 222 Fall 2016 Section D 1 25

What this course and most data analysis is about: Well-formed Question + Data +

What this course and most data analysis is about: Well-formed Question + Data + Statistics + Analytical Thinking Answers, Recommendations • In QM 222 and your project, you’ll do this! • The focus is hands-on, working with data in Excel, doing it to learn it. • Different from QM 221 – Learning how to calculate the statistics is only a small part of this course. What is more important is what the statistics tell us. • Another difference is that this class is about using data, statistics and analysis to understand the relationships between variables (things that vary) QM 222 Fall 2016 Section D 1 26

Example 1 of what this course is about • At Comcast, what do they

Example 1 of what this course is about • At Comcast, what do they want to know, what question(s) do they want to answer? • Why do they want to know the answer? • What data do they use to answer the question? What might be better data? • What might Comcast do with the results of their data analysis? • We said that QM 222 is about the relationships between variables What “relationship(s)” is this about? • Comcast Business Exec: https: //www. youtube. com/watch? v=3 Z-0 n. Lr 6 g 0 k QM 222 Fall 2016 Section D 1 27

Example 2 of what this course is about • At Target, what question(s) do

Example 2 of what this course is about • At Target, what question(s) do they want to answer? • Why do they want to know this answer? • What data do they use to answer it? • What does Target do with the results of the data analysis? • What “relationship(s)” is this about? • http: //www. nytimes. com/video/magazine/1000000013 67956/timescast--retailers-predictions. html QM 222 Fall 2016 Section D 1 28

Comparing the two examples • Both examples are about relationships between “variables” (things that

Comparing the two examples • Both examples are about relationships between “variables” (things that vary. ) • Comcast wants to know about what causes customer satisfaction or dissatisfaction, in order to affect that satisfaction… • Target wants to be able to know what information (that they have) is associated or correlated with pregnancy. • Why? In order to affect sales – clearly they had already done data analysis showing how important it was to attract women when they are pregnant. QM 222 Fall 2016 Section D 1 29

The project • Project assignment is on the “sites” website. It says: • Your

The project • Project assignment is on the “sites” website. It says: • Your project should use regression and any other relevant statistics to answer a question of your choice, whose answer will be useful to your client. • The regression(s) will be measuring relationships between variables in order to answer the question. • Your topic must be approved by Prof. Kahn. QM 222 Fall 2016 Section D 1 30

The project cont. • Your final project should be written in the form of

The project cont. • Your final project should be written in the form of a report to a client who would be interested in knowing your results. The client can be one or more people at a company, governmental unit, or other organization. • The final project should include a 1 page executive summary and an 8 -20 page report (including tables or graphs. ) • More is not better. I do not require or expect 20 pages, but am allowing it. ) QM 222 Fall 2016 Section D 1 31

Some of previous years’ questions/topics: Predicting job satisfaction in the work place. How do

Some of previous years’ questions/topics: Predicting job satisfaction in the work place. How do working long days affect people’s happiness with their marriage? What kinds of household tend to own more vehicles? Who is the typical smoker? Predicting Flow of the Elbow River at Bragg Creek during the spring/summer What drives dividends? Financial Ratios and Profitability in the Apparel Industry. Impact of Advertising on Sales of … (student’s own product he sold to students) International Students in the United States: A Statistical Analysis. How MPG affects sales prices of cars. Factors that affect whether a pregnant woman 18 -26 years old will have an abortion. • Lots of sports topics: How injuries affect a basketball player’s later performance, the impact of offensive coordinators on team success in the NFL, what’s more important to winning golf tournaments – putting or long drives. Etc. • What drives dividends? • • • QM 222 Fall 2016 Section D 1 32

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and

Today’s Agenda • Introductions • Go over syllabus together. • What this course (and most data analysis) is about. • Datasets. • Jeopardy QM 222 Fall 2016 Section D 1 33

In data-sets • Each row is an observation. • One occurrence of the thing

In data-sets • Each row is an observation. • One occurrence of the thing you are examining. • n is the number of observations. • In the data set on the next page – the observation is one movie. QM 222 Fall 2016 Section D 1 34

What data sets look like in Excel (Movie data from IMDB, metacritic) QM 222

What data sets look like in Excel (Movie data from IMDB, metacritic) QM 222 Fall 2016 Section D 1 35

In data-sets • Each row is an observation. • One occurrence of the thing

In data-sets • Each row is an observation. • One occurrence of the thing you are examining. • n is the number of observations. • In the data set on the next page – the observation is one movie. • Each column is a variable. • Something you know about the observation. • Here – the name, year, metascore, budget etc. • Each column is a variable. • How many observations do you need for your project? • If data is on individual people, you need at least 300 observations, but ideally thousands • If using data on companies, teams, etc: at least 100. QM 222 Fall 2016 Section D 1 36

Kinds of Data and Datasets • Numerical data v. Categorical data • You must

Kinds of Data and Datasets • Numerical data v. Categorical data • You must not use categorical data as if it were numerical. • But CAN use it – We’ll learn how. • Cross sectional v. Time Series v. panel Data Sets • Cross Section – at one point of time, each observation is a different person, company etc. • v. Time series – each observation is a different point of time • v. Cross Section -Time Series: each observation is a different company etc. at a specific point of time • v. Panel data or longitudinal data: The same people/companies etc. are observed at different points of time. Each observation is a specific company etc. at a specific point of time QM 222 Fall 2016 Section D 1 37 SM 222 Class 2

Finding a topic: some tips • It needs to be about how variables relate

Finding a topic: some tips • It needs to be about how variables relate to each other. Students sometimes start by looking at what data is available • I will give some suggestions of available data sets, • or you can find other data sets (I’m happy to help find them), • or you can collect data yourself. • Some data sets you might think about using are on the websites. bu. edu/qm 222 projectcourse/data-set. • All data sets come with codebooks that lists the variables available and how they are coded in the dataset. • Surveys often give you the questionnaires. • No ideas? Looking these through codebooks of some of the data sets I suggest might help you think of ideas. QM 222 Fall 2016 Section D 1 38

Finding data: some tips • Finding good data is usually the hardest part of

Finding data: some tips • Finding good data is usually the hardest part of the process of data-based business decisions, and the hardest part of your project. • Finding data is like going on a treasure hunt when you aren’t sure if there is actually a treasure at its end. • Warning: Never presume that you can find the exact data you want. • When you think you have found a question and the data to answer it, you really need to check out that it has the data you want for the years you want with enough observations. QM 222 Fall 2016 Section D 1 39

After you find the data, you need to clean and prepare the data •

After you find the data, you need to clean and prepare the data • There may be some observations missing some important variable – you can’t use that observation. • Data may be in strings (words) that need to be converted into numbers. • You may need to combine several variables in order to make the one you want (For instance dividing births by the population to get the birth rate. ) QM 222 Fall 2016 Section D 1 40

Some data sets – described at more length on the website • Compustat •

Some data sets – described at more length on the website • Compustat • American Community Survey • General Social Survey • Add Health QM 222 Fall 2016 Section D 1 41

Compustat N. America • Compustat North America is a database of U. S. &Canadian

Compustat N. America • Compustat North America is a database of U. S. &Canadian fundamental and market information on active and inactive publicly held companies. It provides more than 300 annual and 100 quarterly Income Statement, Balance Sheet, Statement of Cash Flows, and supplemental data items on more than 24, 000 publically held companies. • Files are available in both annual and quarterly formats. • Files contain information on aggregates, industry segments, banks, market prices, dividends, earnings, and ratios among other things. • For instance, you might have ideas related to Total Earnings, Earnings per share, Price per share, Sales, Expenses, R&D expenses, COGS, net income, net income from continuing operations, market to book ratio, market value of equity, book value of equity, total debt, dividends per share, operating cash flow, executive compensation, etc. • The Pardee librarians are the best resource on this. Justin can also help. • The Compustat page on sites. bu. edu/sm 222 projectcourse will tell you how to access this data. QM 222 Fall 2016 Section D 1 42

American Community Survey (ACS) • The American Community Survey (ACS) is run by the

American Community Survey (ACS) • The American Community Survey (ACS) is run by the Bureau of the Census. • The survey covers topics such as demographics, education, income, employment, occupation and many housingrelated variables. • ACS samples about 1 in every 40 addresses every year, or 250, 000 addresses every month. • Data is available for the US or by state; for a 1 year, 3 year or 5 year period (with not all variables in the 1, 3 and 5 year file. ) • Danika will help you with this dataset. QM 222 Fall 2016 Section D 1 43

General Social Survey • Tracks of Americans over the last four decades. More than

General Social Survey • Tracks of Americans over the last four decades. More than 1000 people are surveyed most years from 1972 through 2014. • The GSS asks questions on demographic, behavioral, and attitudinal questions for a nationally representative sample of adults. • Mostly, different people are surveyed each year. • However, there also a few panels i. e. some people are interviewed several times in a row. • This will allow you to look at how a person changes over time, perhaps in response to something that happened the first time they were interviewed. QM 222 Fall 2016 Section D 1 44

Add Health: panel data • The National Longitudinal Study of Adolescent to Adult Health

Add Health: panel data • The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents who were in grades 7 -12 in the United States in 1994 -95. • The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24 -32. • Add Health has data on: • Education, family • Friends and social networks • Physical health and psychological well-being • Reproductive health, drug and substance abuse, • Occupation, information on jobs or lack of jobs, income, etc. • Students who used Add health last year often found that they needed to merge data from several different parts of Add. Health (not hard to do in Stata). QM 222 Fall 2016 Section D 1 45

Other data sets you can find • There are MANY sports data sets…. .

Other data sets you can find • There are MANY sports data sets…. . • And data analysis is increasingly important in sports • See e. g. . http: //www. forbes. com/sites/bernardmarr/2015/03/2 5/big-data-the-winning-formula-in-sports/ • Danika will help you with sports datasets. QM 222 Fall 2016 Section D 1 46

Other data sets you can find • Current Population Survey (find where you find

Other data sets you can find • Current Population Survey (find where you find the ACS called IPUMS) – a national survey with more on working and work history. • The decennial Censuses (find where you find the ACS called IPUMS) for historical data • National Health and Nutrition Examination Survey (latest 2015 -6) • NLSY A longitudinal survey that follows people over time and asks a lot of question. • ATUS American time use survey (find where you find the ACS called IPUMS) … might be harder to use. QM 222 Fall 2016 Section D 1 47

If time: Jeopardy! QM 222 Fall 2016 Section D 1 48

If time: Jeopardy! QM 222 Fall 2016 Section D 1 48

The next week • Go to Friday section 1 pm • Buy Stata (if

The next week • Go to Friday section 1 pm • Buy Stata (if you plan to). • Think about a topic (and look at Assignment 1). • Talk with friends about what interests you. • Then think of a question about that topic that is about relationships between variables and will be of interest to a client. • If nothing hits you, look through the questionnaires or etc. of datasets we’ve told you about. sites. bu. edu/qm 222 projectcourse • Bring laptop and tent-cards to class next Monday QM 222 Fall 2016 Section D 1 49