VARIABLES 2 CONTENTS What are variables Variable Types
VARIABLES 2
CONTENTS What are variables? Variable Types Distributions of Values The Mean Explained
Variables Values An example hypothesis WHAT ARE VARIABLES?
VARIABLES Variables describe the differences between (i) people or (ii) situations Every variable has a set of values Each participant will have one value for each variable
VARIABLES Variables describe the differences between (i) people or (ii) situations Variable Gender Age IQ Diligence Condition Phase
VALUES Every variable has a set of values each participant will have one value from that set Variable Values Gender female, other Age 0 -100 IQ 75 -125 Diligence 0 -10 Condition Expt/Control group Phase pre- post-treatment
VALUES • Variables ‣ ways in which participants or their situation vary • Values ‣ each person has a single value for each variable
AN EXAMPLE HYPOTHESIS Females are more interesting than males Variations in participant gender Variations in Interestingness
AN EXAMPLE HYPOTHESIS Females are more interesting than males Variations in participant gender Values: females, males Variations in Interestingness The relationship between variables Values: scores Hypothesis: Does Gender (variable 1) affect Interestingness (variable 2)?
VARIABLES It is crucial to distinguish between: the names of variables the values of variables Gender is the name of a variable Male and Female are the values of a variable. The test: Everyone has a <name of property> Only some are <value of property> eg. Everyone has a gender eg. Only some are female
Variable Types Measuring Variables Categorical variables VARIABLE TYPES Interval variables
VARIABLE TYPES • All variables can be labelled by type of measurement • There are two primary types: ‣ Categorical ‣ Interval • There is one hybrid type: ‣ Ordinal • Very rarely there is a subtype of Interval: ‣ Ratio
MEASURING VARIABLES • There are 2 important properties of a value • Discrete or continuous ‣ discrete: fixed (small) set of values ‣ continuous: unlimited • Ordered or not ‣ ordered: quantities (value 1 > value 2 > value 3) ‣ not: categories
MEASURING VARIABLES • How to measure the variables Basic Measure categories quantities
MEASURING VARIABLES • How to measure the variables Basic Measure Discrete categories X quantities Ordered X
MEASURING VARIABLES • How to measure the variables Basic Measure Discrete categories X quantities Ordered Values labels X numbers
MEASURING VARIABLES • How to measure the variables Basic Measure Discrete categories X quantities Ordered X Values Variable Type labels Categorical numbers Interval
MEASURING VARIABLES • How to measure the variables Basic Measure Discrete categories X quantities Ordered Values Variable Type labels Categorical X integers Ordinal X numbers Interval
CATEGORICAL • Values have labels (not numbers) which describe categories or groups ‣ hair colour: brown, blonde, black, ginger ‣ exam result: pass fail • Can be a situation ‣ which group: experimental or control ‣ trial phase: start, at 3 months, at 6 months
INTERVAL • Numerical data where the difference between values is meaningful ‣ weight the difference between being 60 kg and 70 kg is the same as the difference between 70 kg and 80 cm ‣ exam grade: 0 -100 • Includes negative values, decimals etc
MEASURING VARIABLES • Categorical variables ‣ values are types not quantities ‣ values are represented by discrete labels • Interval variables ‣ values are quantities on a scale ‣ values are represented by continuous numbers
Contents Distribution of Values Central Tendency Categorical Variables Ordinal Variables Interval Variables Dispersion Categorical Variables Ordinal Variables Interval Variables Normal Distribution
DISTRIBUTION OF VALUES • A population, or a sample, has a ‣ distribution of values for each variable • Each value has a ‣ frequency in the population • These distribution graphs show the frequency of different values in the population • How can we describe that distribution?
DISTRIBUTION OF VALUES • Central tendency ‣ what is the most likely/typical value? • Dispersion ‣ what is the range/spread of values
CENTRAL TENDENCY Central tendency = single value to summarise a distribution. the best estimate of the value of the next person Variable Type Measure Categorical mode = most common value Ordinal median = middle value Interval mean = most typical value
CATEGORICAL VARIABLES Car. Make Ford Nissan Ford VW Ford Nissan VW Ford Skoda Mode = Ford • Mode = most common value • may be more than one mode
ORDINAL VARIABLES Exam Score 19 32 40 44 44 44 45 46 56 Median = 44 • Median = middle value ‣ find by sorting data into order ‣ select middle value ‣ if even number of values, median = midway between two middle values • Same number of values below and above median
INTERVAL VARIABLES Exam Score 19 32 40 44 44 44 45 46 56 Sum = 370 n=9 Mean = 370/9 = 41. 1 • Mode & median can be used • Mean – often called ‘average’ ‣ Calculated by adding all of the values together ‣ divide by the number of values
DISPERSION A value to describe the spread of a distribution Variable Type Measure Categorical number of categories Ordinal interquartile range Interval standard deviation
CATEGORICAL VARIABLES • Number of different categories • how equally distributed the values are
ORDINAL VARIABLES • Range ‣ largest value minus smallest • Inter-quartile range ‣ range of central half of distribution
ORDINAL VARIABLES Exam Score 19 32 40 44 44 44 45 46 56 Take middle 50% of data IQR = top – bottom of this range
INTERVAL VARIABLES • Three options: • Range ‣ largest value minus smallest • Inter-quartile range ‣ range of central half of distribution • Standard deviation
INTERVAL VARIABLES Exam Score Deviation 19 -22 32 -9 40 -1 44 3 45 4 46 5 56 15 deviation=value – mean SD = sqrt(sum(deviation 2)/(n-1)) = 10. 4 NB: When we are calculating: sample – divide by (n-1) population – divide by n
NORMAL DISTRIBUTION • This specific shape is called the normal distribution • Many variables follow this pattern • it happens when a variable is the result of many different independent events
NORMAL DISTRIBUTION Mean • The mean is at the peak of the normal distribution • 68% of values fall within one standard deviation of the mean • 95% within 2 standard deviations 68%
Contents Mean: what it really is Mean & Deviations Standard Deviation THE MEAN EXPLAINED
MEAN: WHAT IT REALLY IS • There is a better way of understanding what a mean is • We are looking for a single value that is the best match to the data it summarizes • So we focus on the differences between the data and the mean – these are called deviations
MEAN &MDEAN EVIATIONS Horizontal line is the mean Vertical lines are deviations
MEAN &MDEAN EVIATIONS • A deviation is the distance between each point and the mean • it is +ve or -ve • Mean (property 1) • the value that satisfies: ‣ sum of all deviations is zero
MEAN &DEVIATIONS • Squared deviations: ‣ all the values are positive numbers • Mean (property 2) • the value that satisfies: ‣ sum of all squared deviations is smallest possible
STANDARD DEVIATION • The mean is the place where the sum of squared deviations (SSQ) is lowest • Think about the SSQ ‣ it is bigger if the distribution is more spread out ‣ so it measures dispersion • Standard deviation is square root of SSQ ‣ because then it changes linearly with the width of the distribution
MEAN AND STANDARD DEVIATION standard deviation mean value
- Slides: 43