Statistics Data Analysis Course Number Course Section Meeting
Statistics & Data Analysis Course Number Course Section Meeting Time B 01. 1305 31 Wednesday 6 -8: 50 pm Midterm Review Professor S. D. Balkin -- March 12, 2003
Midterm Format § Open book and open notes • No solution guides or other resources are permitted § A scientific calculator will be required § All questions will be short answer § Entire class period is available for exam Professor S. D. Balkin -- March 12, 2003 2
Exam Coverage § Chapter 1 • Understand reasons for statistics § Chapter 2 • • Distinguish between qualitative and quantitative variables Describe and interpret plots of data Understand calculate measures of center Understand calculate measures of variation Professor S. D. Balkin -- March 12, 2003 3
Exam Coverage § Chapter 3 • Understand different sources of probabilities • Understand use basic principles of probability • Addition • Compliments • Multiplication • Calculate conditional and unconditional probabilities • Understand, use and determine statistical independence • Be able to construct and interpret probability tables and trees § Chapter 4 • Understand probability distributions • Calculate the expected value and standard deviation of a probability distribution Professor S. D. Balkin -- March 12, 2003 4
Exam Coverage § Chapter 5: Some Special Probability Distributions • Calculate probability of an event using • Counting methods • Binomial distribution • Normal distribution § Chapter 6: Random Samples and Sampling Distributions • Understand identify sources of sample bias • Understand difference between the distribution of a summary statistic and distribution of a population • Identify the sampling distribution of the sample mean • Understand the use of the Central Limit Theorem • Interpret a normal probability plot Professor S. D. Balkin -- March 12, 2003 5
Exam Coverage § Chapter 7: Point and Interval Estimation • Understand unbiased and efficient estimators • Calculate and interpret confidence intervals • For population mean with standard deviation known • For population proportion • For population mean with standard deviation unknown • Determine sample sizes for a given confidence level and tolerance width • Understand t-distribution • Understand key assumptions underlying confidence interval methods Professor S. D. Balkin -- March 12, 2003 6
Practice Problems with Answers in Book § § § § 2. 26 3. 35 3. 36 3. 47 3. 48 3. 53 3. 54 3. 55 3. 59 3. 60 3. 63 3. 64 3. 65 § § § § 3. 66 3. 67 3. 68 4. 35 4. 36 5. 37 5. 38 5. 40 5. 41 6. 29 6. 35 6. 36 6. 37 7. 41 Professor S. D. Balkin -- March 12, 2003 § § § § 7. 42 7. 47 7. 48 7. 59 7. 60 7. 76 7. 77 7
Interpretation Review • • • • Mode: value or category with the highest frequency in the data Median: middle value when the data are arranged from lowest to highest Mean: sum of measurements divided by the number of measurements Variance: squared deviations from the mean Empirical Rule: IQR: 75 th percentile – 25 th percentile Random Variable: quantitative result from an experiment that is subject to random variability Expected Value: probability-weighted average of possible values Permutations: number of sequences of r symbols taken k at a time Combinations: number of subsets of r symbols taken k at a time Central Limit Theorem: For any population, the sampling distribution of the sample mean is approximately normal if the sample size is sufficiently large. Interval estimate: states the range within which a population parameter probably lies 95% Confidence interval: • About 95% of similarly constructed intervals will contain the parameter being estimated Professor S. D. Balkin -- March 12, 2003 8
Question #1 § Fortune magazine publishes a list of the world's billionaires each year. The 1992 list includes 233 individuals. Describe this distribution of wealth. Why do you think the distribution is the way it is (Hint: is this a representative sample)? Professor S. D. Balkin -- March 12, 2003 9
Question #2 § As a marketing consultant, you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store. § (a) Create and interpret a histogram of these data. § (b) Create and interpret a stem-and-leaf plot of these data. § (c) Create and interpret a boxplot of these data. § (d) Provide your client with an executive summary of your analysis. Professor S. D. Balkin -- March 12, 2003 10
Question #3 § § § A narcotics enforcement unit works with customs officers at an airport that serves international travelers on a route that has plausible links to the drug trade. This enforcement unit has developed a smuggler profile that it uses to initiate full searches of people who meet the profile. These profiles typically require meeting a number of conditions such as (a) male under 40, (b) traveling alone, (c) loose clothing, and so on. Fully 100% of the travelers who meet the profile were searched, and 10% of those who did not meet the profile were searched. After collecting considerable data, these figures resulted: § § § § Percentage of people who meet the profile: 4% Percentage of people who meet the profile and then are found to have illegal drugs 35% Percentage of people who do not meet the profile and then are found to have illegal drugs 3% (a) Based on these figures, what percentage of travelers on this particular route is carrying illegal drugs? (b) What percentage of the drug-carrying travelers will be captured by this procedure? Assume that all drug carriers who are searched will be captured. (c) Given that a traveler is carrying illegal drugs (whether captured or not), what is the probability that this person will meet the profile? Professor S. D. Balkin -- March 12, 2003 11
Question #4 § § § A restaurant has collected data on its customers’ orders and had estimated probabilities about what happens after the main course. It was found that 20% of the customers had dessert only, 40% had coffee only, and 30% had both dessert and coffee. (a) Draw a probability tree for this situation (b) Find the probability of the event “had coffee. ” (c) Find the probability of the event “did not have dessert” (d) What percentage of customers will have “neither coffee nor dessert”? (e) What percentage of customers will have “coffee OR dessert”? (f) Are the events “had coffee” and “had dessert” mutually exclusive? How do you know? (g) Given that a customer had coffee, what is the probability that the same customer had dessert? (h) Are “had dessert” and “had coffee” independent events? How do you know? (i) Find the conditional probability of having dessert GIVEN that the customer did not have coffee (j) Find the conditional probability of having dessert GIVEN that the customer did have coffee (k) Based on your analyses above, who is more likely to order dessert, a customer who orders coffee, or one who does not? Professor S. D. Balkin -- March 12, 2003 12
Question #5 § § § § Acorn is the acronym for Association of Community Organizations for Reform Now. These data were presented by Acorn to a Joint Congressional Hearing on discrimination in lending. Acorn concluded, "Banks generally have exhibited a pervasive pattern of lending practices that have the effect, intended or not, of racial discrimination. Wide disparities in rejection rates for minority and white applicants, even in comparable income groups, were found in all SMA's, and at nearly every institution studied. " The data provide are as follows: Data: bankdata. txt Number of cases: 20 Variable Names: · Name of bank · MIN = refusal rate for minority applicants · WHITE = refusal rate for white applicants · HIMIN = refusal rate for high income minority applicants · HIWHITE = refusal rate for high income white applicants Using the data provided and the methods learned in class, write a short argument in support of or disputing Acorn’s claim that banks have exhibited racial discrimination. Use both graphics and text to help make you case. Professor S. D. Balkin -- March 12, 2003 13
Question #6 § Research on insider traders who were arrested revealed that 38% of them committed some other white-collar crime. § What is the probability that of the last 100 arrested insider traders, 30 committed another crime? Professor S. D. Balkin -- March 12, 2003 14
Question #7 Professor S. D. Balkin -- March 12, 2003 15
Question #8 § Identify a situation relating to your work or business interests in which statistical sampling might be (or has been) helpful § (a) Describe the population and indicate how a sample could be chosen § (b) Identify a population parameter of interest and indicate how a sample statistic could shed light on this unknown. § (c) Explain the concept of the sampling distribution of this statistic for your particular example. § Professor S. D. Balkin -- March 12, 2003 16
Question #9 Professor S. D. Balkin -- March 12, 2003 17
Question #10 § A city decides to determine the mean expenditures per tourist per visit. A random sample of 100 finds that the average expenditure is $800. The standard deviation of expenditures for all tourists is $120. § A) What is the standard deviation of the mean, given that the standard deviation of the whole population is $120 and the number of people sampled is 100? § B) What is a 95% confidence interval for the value of the expenditures per tourist? Provide an interpretation. § C) If the city wants to determine the average expenditure within plus or minus $20, how many people does it need to sample? Professor S. D. Balkin -- March 12, 2003 18
Question #11 § In border towns such as Detroit and Buffalo, Canadian coins frequently end up in business cash registers. Canadian denominations are identical to U. S. denominations, and the coins are virtually identical in size, color, and weight. At present, the exchange rate favors the U. S. , and banks encourage their customers to sort out the Canadian coins. § § A Buffalo bank has been monitoring the deposits of one of its large customers, a supermarket. The bank has recorded on 45 days the face value of Canadian coins per $100 deposited. For these 45 days, the average amount was $3. 46, with a standard deviation of $0. 52. Give a 95% confidence interval for the population mean. Professor S. D. Balkin -- March 12, 2003 19
- Slides: 19