Measuring Observed Race Preliminary Findings from the Observed

Measuring Observed Race: Preliminary Findings from the Observed Measures Supplement Anthony Daniel Perez and Charles Hirschman CSDE Colloquium Presentation, UW-BHS Workshop 19 October 2007

Example of Self-Reported Race • 2 nd generation Thai immigrant with complex ancestry: – Mixed Thai and Chinese – Also part white, black, and Native American

Observed Race?

Observed Race?

Other Opinions • In America…when you look like me, you're black. –Colin Powell • It's not that we don't respect Tiger Woods's right to call himself a Cablinasian. We just don't think it will help him get a cab in D. C. –Lonnae O'Neal Parker

A Question of Perspective • Physical appearance is important – Racial profiling – Some forms of interpersonal discrimination • Reflected race may be a good proxy for appearance, but makes assumptions • When appearance outweighs identity, better if measured directly: External or observed race

The Observed Measures Supplement (OMS) • Auxiliary data collection effort – External measurement of BHS respondents’ race, body type, and physical attractiveness – Sourced from high school yearbooks purchased during BHS survey years

OMS Pre-test Design and Characteristics • Web-based questionnaire – Raters drawn from UW summer classes – 19 raters X 25 pictures = 475 ratings – Three dimensions measured • Pre-test pictures selected for ambiguity • Key questions of interest: – Are measures of observed race reliable? – How many ratings are needed?

Race Question from OMS Pre-test What is this person’s racial/ethnic background? Check all that apply. -Hispanic/Latino -White -Black -American Indian or Alaska Native -Asian -Native Hawaiian/Pacific Islander -Other (please specify) Note: Lena Horne not an actual UW-BHS respondent

High Reliability Cases

Others Prove More Challenging

Are Ratings Reliable? Summary of Inter-Rater Agreement (IRA) • Little agreement among raters – Pictures assigned 1. 4 races on average – Only three were unanimously monoracial – Just two rated consistently by all raters • But with 19 raters, some variation is expected, even invited by the choice of pre -test pictures – How do we choose?

How to Obtain Consistent Ratings • Many options if number of ratings is large – Majority rating (e. g. 90% black) – Modal rating if no majority – Pool racial categories to reduce variation • But it’s impractical to collect dozens of ratings per respondent (BHS population just under 10, 000 people) • Possible to make do with fewer ratings, and if so, how few?

Redefining Inter-rater Agreement • Treat IRA as the proportion of consistent ratings across repeated trials • Calculate using combinatorial analysis • Sets of two, three, or five raters • “Consistent” rating defined as a simple majority (e. g. 2/2, 2/3, 3/5)

Pairing Raters • Does little to reduce uncertainty – Probability of any two raters agreeing on all pictures is zero – Probability of agreement on any photo is less than 0. 5 – Only 11 of 25 photos have more than a 50/50 chance of being rated consistently • Uncertainty exacerbated by failure to resolve ties (problem with all even numbers) • How about three raters, or even five?





Summary • Can we reliably measure observed race? – – Yes, perhaps with as few as three raters 75% average IRA for a 127 “category” race question 92% for six category “best race” (choose only) Moderate gains in both measures using five raters • Estimates are almost certainly conservative – Sampled pictures chosen for ambiguity; not representative of UW-BHS – Raters are younger and more diverse than state or national population – Both sources increase uncertainty of observed race, so IRA biased downward
- Slides: 20