CHS 221 VISUALIZING DATA 1 Week 3 Dr

  • Slides: 61
Download presentation
CHS 221 VISUALIZING DATA 1 Week 3 Dr. Wajed Hatamleh http: //staff. ksu. edu.

CHS 221 VISUALIZING DATA 1 Week 3 Dr. Wajed Hatamleh http: //staff. ksu. edu. sa/whatamleh/en

VISUALIZING DATA • Depict the nature of shape or shape of the data distribution

VISUALIZING DATA • Depict the nature of shape or shape of the data distribution • In a graph: Different graphs used for different types of data 2

HISTOGRAM n Another common graphical presentation of quantitative data is a histogram. n The

HISTOGRAM n Another common graphical presentation of quantitative data is a histogram. n The variable of interest is placed on the horizontal axis. n A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. 3

HISTOGRAMS Histograms: Used for quantitative data Similar to a bar graph, with an X

HISTOGRAMS Histograms: Used for quantitative data Similar to a bar graph, with an X and Y axis—but adjacent values are on a continuum so bars touch one another Data values on X axis are arranged from lowest to highest Bars are drawn to height to show frequency or percentage (Y axis) 4

HISTOGRAMS (CONT’D) Example of a histogram: Heart rate data f Heart rate in bpm

HISTOGRAMS (CONT’D) Example of a histogram: Heart rate data f Heart rate in bpm 5

Histogram A bar graph in which the horizontal scale represents the classes of data

Histogram A bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies. Figure 2 -1 6

Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but

Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies. Figure 2 -2 7

Histogram and Relative Frequency Histogram Figure 2 -1 Figure 2 -2

Histogram and Relative Frequency Histogram Figure 2 -1 Figure 2 -2

Ogive n An ogive is a graph of a cumulative distribution. n The data

Ogive n An ogive is a graph of a cumulative distribution. n The data values are shown on the horizontal axis. n Shown on the vertical axis are the: • cumulative frequencies, or • cumulative relative frequencies, or • cumulative percent frequencies n The frequency (one of the above) of each class is plotted as a point. n The plotted points are connected by straight lines. 9

Ogive A line graph that depicts cumulative frequencies Figure 2 -4 10

Ogive A line graph that depicts cumulative frequencies Figure 2 -4 10

BAR GRAPHS Bar graphs: Used qualitative data. Bar graphs have a horizontal dimension (X

BAR GRAPHS Bar graphs: Used qualitative data. Bar graphs have a horizontal dimension (X axis) that specifies categories (i. e. , data values) The vertical dimension (Y axis) specifies either frequencies or percentages Bars for each category drawn to the height that indicates the frequency or % 11

BAR GRAPHS Example of a bar graph Note the bars do not touch each

BAR GRAPHS Example of a bar graph Note the bars do not touch each other

PIE CHART Pie Charts: Also used for qualitative data. Circle is divided into pie-shaped

PIE CHART Pie Charts: Also used for qualitative data. Circle is divided into pie-shaped wedges corresponding to percentages for a given category or data value All pieces add up to 100% Place wedges in order, with biggest wedge starting at “ 12 o’clock” 13

PIE CHART Example of a pie chart, for same marital status data

PIE CHART Example of a pie chart, for same marital status data

Recap In this Section we have discussed graphs that are pictures of distributions. Keep

Recap In this Section we have discussed graphs that are pictures of distributions. Keep in mind that the object of this section is not just to construct graphs, but to learn something about the data sets – that is, to understand the nature of their distributions. 15

CHARACTERISTICS OF A DATA DISTRIBUTION Central tendency Variability Both central tendency and variability can

CHARACTERISTICS OF A DATA DISTRIBUTION Central tendency Variability Both central tendency and variability can be expressed by indexes that are descriptive statistics 16

CENTRAL TENDENCY Indexes of central tendency provide a single number to characterize a distribution

CENTRAL TENDENCY Indexes of central tendency provide a single number to characterize a distribution Measures of central tendency come from the center of the distribution of data values, indicating what is “typical, ” and where data values tend to cluster Popularly called an “average” 17

CENTRAL TENDENCY INDEXES Three alternative indexes: The mode The median The mean 18

CENTRAL TENDENCY INDEXES Three alternative indexes: The mode The median The mean 18

THE MODE The mode is the score value with the highest frequency; the most

THE MODE The mode is the score value with the highest frequency; the most “popular” score Age: 26 27 27 28 29 30 31 Mode = 27 The mode 19

THE MODE: ADVANTAGES Can be used with data measured on any measurement level (including

THE MODE: ADVANTAGES Can be used with data measured on any measurement level (including nominal level) Easy to “compute” Reflects an actual value in the distribution, so it is easy to understand Useful when there are 2+ “popular” scores (i. e. , in multimodal distributions) 20

Mode A data set may be: Bimodal Multimodal No Mode v denoted by M

Mode A data set may be: Bimodal Multimodal No Mode v denoted by M the only measure of central tendency that can be used with qualitative data 21

Examples a. 5. 40 1. 10 0. 42 0. 73 0. 48 1. 10

Examples a. 5. 40 1. 10 0. 42 0. 73 0. 48 1. 10 ïMode is 1. 10 b. 27 27 27 55 55 55 88 88 99 ïBimodal - c. 1 2 3 6 7 8 9 10 27 & 55 ïNo Mode 22

THE MODE: DISADVANTAGES Ignores most information in the distribution Tends to be unstable (i.

THE MODE: DISADVANTAGES Ignores most information in the distribution Tends to be unstable (i. e. , value varies a lot from one sample to the next) Some distributions may not have a mode (e. g. , 10, 11, 12) 23

THE MEDIAN The median is the score that divides the distribution into two equal

THE MEDIAN The median is the score that divides the distribution into two equal halves 50% are below the median, 50% above Age: 26 27 27 28 29 30 31 Median (Mdn) = 28 The median 24

5. 40 1. 10 0. 42 0. 48 0. 73 0. 48 1. 10

5. 40 1. 10 0. 42 0. 48 0. 73 0. 48 1. 10 5. 40 (even number of values – no exact middle shared by two numbers) 0. 73 + 1. 10 MEDIAN is 0. 915 2 5. 40 0. 42 1. 10 0. 48 0. 42 0. 66 (in order - exact middle 0. 73 0. 48 1. 10 0. 66 0. 73 1. 10 5. 40 odd number of values) MEDIAN is 0. 73 25

THE MEDIAN: ADVANTAGES Not influenced by outliers Particularly good index of what is “typical”

THE MEDIAN: ADVANTAGES Not influenced by outliers Particularly good index of what is “typical” when distribution is skewed Easy to “compute” 26

THE MEDIAN: DISADVANTAGES Does not take actual data values into account—only an index of

THE MEDIAN: DISADVANTAGES Does not take actual data values into account—only an index of position Value of median not necessarily an actual data value, so it is more difficult to understand than mode 27

THE MEAN The mean is the arithmetic average Data values are summed and divided

THE MEAN The mean is the arithmetic average Data values are summed and divided by N Age: 26 27 27 28 29 30 31 Mean = 28. 3 The mean 28

THE MEAN (CONT’D) Most frequently used measure of central tendency Equation: M = ΣX

THE MEAN (CONT’D) Most frequently used measure of central tendency Equation: M = ΣX ÷ N Where: M = sample mean Σ = the sum of X = actual data values N = number of people 29

THE MEAN: ADVANTAGES The balance point in the distribution: Sum of deviations above the

THE MEAN: ADVANTAGES The balance point in the distribution: Sum of deviations above the mean always exactly balances those below it Does not ignore any information The most stable index of central tendency Many inferential statistics are based on the mean 30

THE MEAN: DISADVANTAGES Sensitive to outliers Gives a distorted view of what is “typical”

THE MEAN: DISADVANTAGES Sensitive to outliers Gives a distorted view of what is “typical” when data are skewed Value of mean is often not an actual data value 31

THE MEAN: SYMBOLS Sample means: In reports, usually symbolized as M In statistical formulas,

THE MEAN: SYMBOLS Sample means: In reports, usually symbolized as M In statistical formulas, usually symbolized as (pronounced X bar) Population means: The Greek letter μ (mu) 32

Notation x is pronounced ‘x-bar’ and denotes the mean of a set of sample

Notation x is pronounced ‘x-bar’ and denotes the mean of a set of sample values ∑x x = n µ is pronounced ‘mu’ and denotes the mean of all values a population µ = in ∑x N 33

Best Measure of Center 34

Best Measure of Center 34

Definitions v Symmetric Data is symmetric if the left half of its histogram is

Definitions v Symmetric Data is symmetric if the left half of its histogram is roughly a mirror image of right half. v its Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other. 35

Skewness Figure 2 -11 36

Skewness Figure 2 -11 36

Recap In this section we have discussed: v Types of Measures of Center Mean

Recap In this section we have discussed: v Types of Measures of Center Mean Median Mode v Mean from a frequency distribution v Best Measures of Center v Skewness 37

MEASURES OF VARIATION Because this section introduces the concept of variation, this is one

MEASURES OF VARIATION Because this section introduces the concept of variation, this is one of the most important sections in the entire book 38

DEFINITION The range of a set of data is the difference between the highest

DEFINITION The range of a set of data is the difference between the highest value and the lowest value highest value lowest value 39

DEFINITION The standard deviation of a set of sample values is a measure of

DEFINITION The standard deviation of a set of sample values is a measure of variation of values about the mean 40

SAMPLE STANDARD DEVIATION FORMULA S= ∑ (x - x) n-1 2 41

SAMPLE STANDARD DEVIATION FORMULA S= ∑ (x - x) n-1 2 41

SAMPLE STANDARD DEVIATION (SHORTCUT FORMULA) s= n (∑x ) - (∑x) n (n -

SAMPLE STANDARD DEVIATION (SHORTCUT FORMULA) s= n (∑x ) - (∑x) n (n - 1) 2 2 42

Standard Deviation Key Points v The standard deviation is a measure of variation of

Standard Deviation Key Points v The standard deviation is a measure of variation of all values from the mean v The value of the standard deviation s is usually positive v The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others) v The units of the standard deviation s are the same as the units of the original data values 43

Definition v. Empirical (68 -95 -99. 7) Rule For data sets having a distribution

Definition v. Empirical (68 -95 -99. 7) Rule For data sets having a distribution that is approximately bell shaped, the following properties apply: v About 68% of all values fall within 1 standard deviation of the mean v About 95% of all values fall within 2 standard deviations of the mean v About 99. 7% of all values fall within 3 standard deviations of the mean 44

The Empirical Rule 45 FIGURE 2 -13

The Empirical Rule 45 FIGURE 2 -13

The Empirical Rule 46 FIGURE 2 -13

The Empirical Rule 46 FIGURE 2 -13

The Empirical Rule 47 FIGURE 2 -13

The Empirical Rule 47 FIGURE 2 -13

ARE YOU READY Post test Time 48

ARE YOU READY Post test Time 48

A. Mean B. Median C. Mode Slide 3 - 49 Which measure of center

A. Mean B. Median C. Mode Slide 3 - 49 Which measure of center is the only one that can be used with data at the catogrical level of measurement?

A. Mean B. Median C. Mode Slide 3 - 50 Which of the following

A. Mean B. Median C. Mode Slide 3 - 50 Which of the following measures of center is not affected by outliers?

A. Mean B. Median C. Mode Slide 3 - 51 Which of the following

A. Mean B. Median C. Mode Slide 3 - 51 Which of the following measures of center is not affected by outliers?

Find the mode (s) for the given sample data. A. 79 B. 48. 1

Find the mode (s) for the given sample data. A. 79 B. 48. 1 C. 42. 5 D. 25 Slide 3 - 52 79, 25, 79, 13, 25, 29, 56, 79

Find the mode (s) for the given sample data. A. 79 B. 48. 1

Find the mode (s) for the given sample data. A. 79 B. 48. 1 C. 42. 5 D. 25 Slide 3 - 53 79, 25, 79, 13, 25, 29, 56, 79

Which is not true about the variance? B. It is a measure of the

Which is not true about the variance? B. It is a measure of the spread of data. C. The units of the variance are different from the units of the original data set. D. It is not affected by outliers. Slide 3 - 54 A. It is the square of the standard deviation.

Which is not true about the variance? B. It is a measure of the

Which is not true about the variance? B. It is a measure of the spread of data. C. The units of the variance are different from the units of the original data set. D. It is not affected by outliers. Slide 3 - 55 A. It is the square of the standard deviation.

A. Mean B. Median C. Mode Slide 3 - 56 Which of the following

A. Mean B. Median C. Mode Slide 3 - 56 Which of the following measures of center is not affected by outliers?

EXERCISE TIME 57

EXERCISE TIME 57

EXERCISE 1 1. The following 10 data values are diastolic blood pressure readings. Compute

EXERCISE 1 1. The following 10 data values are diastolic blood pressure readings. Compute the mean, the range and SD, for these data. 130 110 160 120 170 120 150 140 160 140 58

EXERCISE 2 The following are the fasting blood glucose level of 10 children 1.

EXERCISE 2 The following are the fasting blood glucose level of 10 children 1. 56 6. 56 2. 62 7. 65 3. 63 8. 68 4. 65 9. 70 5. 65 10. 72 Compute the: a. range b. standard deviation 59

EXERCISE 3 3. The fifteen patients making initial visits to a rural health department

EXERCISE 3 3. The fifteen patients making initial visits to a rural health department travelled these distances: Find: a. Range, b. Standard Deviation Patient Distance (Miles) 1 2 3 4 5 6 7 5 9 11 3 12 13 12 8 9 10 11 12 13 14 15 6 13 7 3 15 12 15 5 60

ANSWER 1. Range = 60 ; SD = 20 2. Range = 16 ;

ANSWER 1. Range = 60 ; SD = 20 2. Range = 16 ; SD = 4. 4 3. Range = 12 ; SD = 4. 2 61