The Effect of Familiarity with the Response Category

The response category labeling effect Motivating Example § a French researcher wants to replicate

The response category labeling effect Research questions § Do the labels attached to the

The response category labeling effect Research questions (cont’d) § various characteristics of rating scales

The response category labeling effect Literature review § certain aspects of the labels attached

The response category labeling effect Inferences based on the range of response alternatives provided

The response category labeling effect The intensity hypothesis § intensity is defined as the

The response category labeling effect The intensity hypothesis (cont’d) § Wyatt and Meyers (1987)

The response category labeling effect The intensity hypothesis (cont’d) § prior evidence that different

The response category labeling effect The familiarity hypothesis § according to the open choice

The response category labeling effect The familiarity hypothesis (cont’d) § formulaic sequences such as

The response category labeling effect The familiarity hypothesis (cont’d) § since collocations have been

The response category labeling effect Two alternative hypotheses to explain the effect of response

The response category labeling effect Study 1: Scaling intensity and familiarity • Do different

The response category labeling effect Study 1 (cont’d) • Label intensity Direct ratings of

The response category labeling effect Study 1: Method § Sample 1: 83 undergraduates; pairwise

The response category labeling effect Study 1: Results

The response category labeling effect Study 1: Results (cont’d) § for intensity, the correlation

The response category labeling effect Study 2 § Direct test of the intensity and

The response category labeling effect Measuring response distributions § A major challenge is to

The response category labeling effect Study 2: Method § online survey with Dutch-speaking panel

The response category labeling effect Study 2: Results § The manipulation of intensity/familiarity was

The response category labeling effect Study 3 § the results of Study 2 are

The response category labeling effect Study 3 (cont’d) § this suggests that making respondents

The response category labeling effect Study 3: Method § Online survey with 122 respondents

The response category labeling effect Study 3: Results § the findings support the familiarity

The response category labeling effect Implications of the category labeling effect for cross-cultural research

The response category labeling effect Schematic representation of the translation process (based on Bassetti

The response category labeling effect Study 4: Method § approx. 200 English- or French-speaking

The response category labeling effect Study 4: Method (cont’d) Language France USA Canada UK

The response category labeling effect Study 4: Results Intensity and familiarity ratings by region

The response category labeling effect Study 4: Results § linear regression of the number

The response category labeling effect Study 5 § demonstration that familiarity is a viable

The response category labeling effect Study 5: Method § § 13, 520 respondents from

The response category labeling effect N female M age SD age Belgium, Dutch 644

The response category labeling effect Study 5: Results Relative endorsement of endpoints in survey

The response category labeling effect Study 5: Results (cont’d) § prior research has generally

The response category labeling effect Discussion: Summary of findings § response category labels that

The response category labeling effect Consequences of the response category label effect § if

The response category labeling effect Consequences of the response category label effect (cont’d) §

The response category labeling effect Implications for multilingual survey research § translations usually imply

The response category labeling effect Identifying appropriate endpoint labels in two languages

Slides: 43

Download presentation

The Effect of Familiarity with the Response Category Labels on Item Response to Likert Scales Bert Weijters Maggie Geuens Hans Baumgartner

The response category labeling effect Motivating Example § a French researcher wants to replicate an empirical finding that was established in the U. S. using data based on consumer self-reports in France; § in the English questionnaire, a Likert scale with endpoints of ‘strongly disagree’ and ‘strongly agree’ was used; § should the French researcher use ‘fortement d’accord’ or ‘tout à fait d’accord’?

The response category labeling effect Research questions § Do the labels attached to the response scale categories influence response behavior (i. e. , how many respondents endorse the extreme scale categories)? § What causes this effect? § How can the effect be mitigated? § What are the implications for multilingual and monolingual surveys?

The response category labeling effect Research questions (cont’d) § various characteristics of rating scales have been studied, but the problem of choosing appropriate labels for the response categories has been largely ignored; § this is surprising because category labels typically apply to many if not all of the items in a questionnaire; § if differences in responding to survey items as a function of the category labels have been acknowledged, the effect has generally been attributed to the perceived intensity of the labels (intensity hypothesis); § in this research we propose the familiarity hypothesis (i. e. , scale categories marked by labels that are used more often in day-to-day language are more likely to be endorsed) and contrast it with the intensity hypothesis;

The response category labeling effect Literature review § certain aspects of the labels attached to the response categories on rating scales can have systematic effects on people’s responses to questionnaires: the range of response alternatives provided can influence respondents’ answers to questions (Schwarz et al. 1988); □ the response alternatives provided (e. g. , feeling really irritated “several times a day” to “less than twice a week” vs. “more than once every 3 months” to “less than once a year”) may affect the interpretation of the question (Schwarz et al. 1988); □ use of different numeric values (-5 to +5 vs. 0 to 10) can change the meaning of endpoint labels such as “not at all successful” (Schwarz et al. 1991); □ § can endpoint labels in Likert scales that differ in terms of the amplifier used (e. g. , “strongly” vs. “completely” (dis)agree) change responses?

The response category labeling effect Inferences based on the range of response alternatives provided Ø Ø Schwarz et al. (1985) question about hours spent watching TV Response options (low range) % Response options (high range) % < ½ hr 7. 4 <2 ½ hrs 62. 5 ½ to 1 hr 17. 7 2 ½ to 3 hrs 23. 4 1 to 1 ½ hrs 26. 5 3 to 3 ½ hrs 7. 8 1 ½ to 2 hrs 14. 7 3 ½ to 4 hrs 4. 7 2 to 2 ½ hrs 17. 7 4 to 4 ½ hrs 1. 6 > 2 ½ hrs 16. 2 > 4 ½ hrs 0. 0

The response category labeling effect The intensity hypothesis § intensity is defined as the degree or extent of the attribute expressed by the label (e. g. , degree of agreement or disagreement, extent of liking); § prior research shows that scale anchors in general (e. g. , adjectives for evaluating products, such as “good”, “terrific”, or “superior”, as in Wildt and Mazis 1978) and amplifiers used in Likert scales (e. g. , “slightly”, “somewhat” or “very much” agree, as in Spector 1976) differ in perceived intensity; § more intense labels represent more extreme positions, which should be endorsed less often (e. g. , agree vs. strongly agree; superior vs. very good);

The response category labeling effect The intensity hypothesis (cont’d) § Wyatt and Meyers (1987) found that when the extremes of the response scale were anchored by narrower or less absolute labels (i. e. , “agree” and “disagree”), responses were distributed more evenly across all five scale steps, whereas when the response scale was bordered by wider or more absolute labels (i. e. , “strongly agree” and “strongly disagree”), responses were concentrated more on the intermediate scale steps; § even more subtle adverbial modifiers (e. g. , strongly vs. completely agree) may influence response behavior;

The response category labeling effect The intensity hypothesis (cont’d) § prior evidence that different intensities are associated with different adverbs (e. g. , Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement; Hintensity: Endpoint response categories are endorsed less frequently if their labels are more intense.

The response category labeling effect The familiarity hypothesis § according to the open choice model in linguistics, the only constraint on the concatenation of words is that the rules of grammar be respected; § in contrast, the idiom principle states that combinations of words are used in conventional patterns, which leads to the phenomenon of collocation: certain word combinations co-occur more often than would be expected based on their individual frequencies (e. g. , strong tea vs. powerful tea); § “of the large repertoire of amplifiers available for expressing a high degree of intensity, speakers rely on a rather limited set of items, and only a few of these are used with great frequency” (Altenberg 1991, p. 133)

The response category labeling effect The familiarity hypothesis (cont’d) § formulaic sequences such as collocations are not only used more frequently by language users and are thus more familiar, but are also processed more quickly (Conklin and Schmitt 2008; Durrant 2008); § based on research on meta-cognitive experiences, this suggests that more familiar, high-frequency labels are more likely to be endorsed (Alter et al. 2007; Fang, Singh, and Ahluwalia 2007; Hawkins and Hoch 1992; Unkelbach 2007; Winkielman et al. 2003) : repeated exposure to a stimulus has beneficial effects on processing fluency; □ repeated and more fluently processed statements are more likely to be rated as true; □ stimulus repetition and fluent processing increase liking, preference and confidence judgments; □

The response category labeling effect The familiarity hypothesis (cont’d) § since collocations have been shown to be processed more quickly, familiar (vs. unfamiliar) labels, because of their greater processing fluency, should be chosen more confidently as the true and preferred response option; § Arce-Ferrer (2006) showed that respondents who were less familiar with the meaning of the intermediate scale categories were more likely to engage in extreme responding and therefore less likely to endorse response options with which they were not familiar; Hfamiliarity: Endpoint response categories are endorsed more frequently if their labels are more familiar.

The response category labeling effect Two alternative hypotheses to explain the effect of response category labels Intensity hypothesis: Hintensity: Endpoint response categories are endorsed less frequently if their labels are more intense. Familiarity hypothesis: Hfamiliarity: Endpoint response categories are endorsed more frequently if their labels are more familiar.

The response category labeling effect Study 1: Scaling intensity and familiarity • Do different methods for scaling the intensity and familiarity of response category labels lead to similar results? If the intensity or familiarity of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents. • Can we identify endpoint labels that vary significantly in intensity and familiarity for use in subsequent studies? We need two labels that imply contradictory responses under the intensity and familiarity hypotheses.

The response category labeling effect Study 1 (cont’d) • Label intensity Direct ratings of intensity (0 = neutral; 10 = 100% agreement) – Pairwise comparisons of intensity (“Which expression indicates the stronger sense of agreement? ”) – • Label familiarity Direct ratings of familiarity (0 = we never use this term in day-to-day language; 10 = we use this term very often in day-to-day language) – Pairwise comparisons of familiarity (“Which expression is more commonly used in day-to-day language? ”) – Lexical decision task (press a button labeled ‘end category label’ or ‘not an end category label’ for 6 endpoint labels and five nonendpoint labels) – Word frequency counts in corpora of texts (Google hits, available for specific word combinations in particular countries and languages) –

The response category labeling effect Study 1: Method § Sample 1: 83 undergraduates; pairwise comparisons of intensity and familiarity of six endpoint labels; § Sample 2: 112 respondents (mean age 32. 03, 66% female) from an online panel; direct ratings of intensity and familiarity on 11 -point scales; § Sample 3: 125 undergraduates (57% female); lexical decision task;

The response category labeling effect Study 1: Results

The response category labeling effect Study 1: Results (cont’d) § for intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is. 92; § the correlations of the means derived from the four familiarity methods range from. 94 to. 97; § thus, there is considerable consistency in respondents’ judgments of the perceived intensity and familiarity of different category labels; § ‘sterk eens’ (strongly agree) consistently emerged as one of the least intense and least familiar labels, while ‘volledig eens’ (completely agree) surfaced as one of the most intense and most familiar labels;

The response category labeling effect Study 2 § Direct test of the intensity and familiarity hypotheses: The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true.

The response category labeling effect Measuring response distributions § A major challenge is to measure differences in response distributions that are not item-specific and independent of substantive content; § To do this, we need to observe patterns of responses across heterogeneous items (i. e. , items that do not share common content but have the same response format): n n Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992) Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010)

The response category labeling effect Study 2: Method § online survey with Dutch-speaking panel members of an online market agency (N = 218); § the respondents ranged in age from 20 to 65 years (M = 43. 2, SD = 11. 7), 47 % were female, and 58% had schooling beyond secondary school; § respondents were randomly assigned to questionnaires varying the endpoint labels (5 -point scale): ‘completely (dis)agree’ (high intensity/familiarity) □ ‘strongly (dis)agree’ (low intensity/familiarity) □ § questionnaire consisted of 16 heterogeneous items (4 pages with 4 items per page) taken from unrelated scales (e. g. , “Air pollution is an important worldwide problem”, “I often give compliments to others”); □ pairwise comparisons of the two response category labels in terms of intensity and familiarity; □

The response category labeling effect Study 2: Results § The manipulation of intensity/familiarity was successful; § The findings support the familiarity hypothesis: Intensity Familiarity Mean number of extreme responses Strongly agree 22% 10% 3. 1 (. 26) Completely agree 78% 90% 4. 4 (. 33)

The response category labeling effect Study 3 § the results of Study 2 are presumably due to the fact that more familiar labels are more easily processed and that this ease of processing inadvertently influences respondents’ answers to survey questions; § as long as the relevance of meta-cognitive experiences is not called into question, people consider this information as diagnostic and incorporate it into their judgments by relying on naïve theories such as, “If the information comes to my mind easily, it must be true or I must like it”; § however, when the diagnosticity or informational value of metacognitive experiences is called into question, people discount this information and either turn to alternative naïve theories such as “The information comes to mind easily because I have often heard it” or use the cognitive content of the stimulus;

The response category labeling effect Study 3 (cont’d) § this suggests that making respondents aware that more familiar response labels may attract more responses and that this may lead them to more readily select the category label “completely (dis)agree” should eliminate the previously observed familiarity effect;

The response category labeling effect Study 3: Method § Online survey with 122 respondents of a university panel (67. 2% women, average of 29 years); § 2 x 2 between-participant design: § intensity/familiarity of the endpoint labels manipulated as in the previous study (‘completely agree’ vs. ‘strongly agree’); § awareness of the label familiarity effect (depending on whether the following instructions were given before or after the collection of the DV): In questionnaires, there are several different ways of labeling response categories (e. g. , “strongly (dis)agree” or “completely (dis)agree”). Previous research has shown that labels that are used more commonly in day-to-day language are more often selected as a response. This happens irrespective of the true opinion of the respondent on the subject of the question. § DV is the number of endpoints responses to 16 heterogeneous questions;

The response category labeling effect Study 3: Results § the findings support the familiarity hypothesis in the unaware condition, but the effect goes away when respondents are made aware of the label familiarity effect : 5 4. 5 Nr of endpoint responses 4 3. 5 strongly 3 completely 2. 5 2 1. 5 1 not aware

The response category labeling effect Implications of the category labeling effect for cross-cultural research § response category labels can affect findings in a single-language § § § context (e. g. , meta-analytic comparisons), but they are particularly important in cross-cultural research, where labels have to be translated; two types of translation: q literal q idiomatic some authors have emphasized the need to choose scale anchors that are equal in intensity (e. g. , Harzing 2006), and prior research has demonstrated that supposedly similar terms may differ in intensity across languages (e. g. , ‘definitely’ vs. ‘bestimmt’; see Smith et al. 2009); however, translated adverbial modifiers may also differ in familiarity;

The response category labeling effect Schematic representation of the translation process (based on Bassetti and Cook 2011)

The response category labeling effect Study 4: Method § approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe; § five endpoint labels in each language; § 16 heterogeneous items from Greenleaf (1992), rated on 5 -point scales; § pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and familiarity;

The response category labeling effect Study 4: Method (cont’d) Language France USA Canada UK Total French 227 0 203 0 430 English 0 185 196 187 568 227 382 399 187 998 Total Version English French Strongly agree Completely agree Extremely agree Definitely agree Fully agree Very much agree Fortement d'accord Complètement d'accord Extrêmement d'accord Définitivement d'accord Entièrement d'accord Tout à fait d'accord

The response category labeling effect Study 4: Results Intensity and familiarity ratings by region Note: Correlation between the familiarity ratings and the natural logarithm of the number of Google hits was at least. 88.

The response category labeling effect Study 4: Results § linear regression of the number of endpoint responses on q q q label intensity label familiarity 4 dummy variables representing the five regions § only label familiarity had a significant effect (Standardized B =. 38, p <. 05, R² =. 14); § in other words, the number of endpoint responses increases as a function of label familiarity, regardless of country and language;

The response category labeling effect Study 5 § demonstration that familiarity is a viable determinant of extreme responding differences between regions in a large-scale international survey; § illustration of how to construct and use relative measures of familiarity and extreme responding based on secondary data only;

The response category labeling effect Study 5: Method § § 13, 520 respondents from 17 European regions; 16 heterogeneous items based on Greenleaf (1992); use of fully labeled 7 -point response scales; familiarity: relative measure of familiarity as the natural logarithm of the ratio of the number of Google hits for the 1 st and 7 th category (strongly agree or disagree) to the number of Google hits for the 2 nd and 6 th category (agree or disagree); § endorsement: relative endorsement of the 1 st and 7 th vs. the 2 nd and 6 th response categories (natural logarithm);

The response category labeling effect N female M age SD age Belgium, Dutch 644 51% 41. 0 11. 1 Belgium, French 371 51% 40. 5 11. 7 UK, English 908 56% 41. 8 11. 3 Germany, German 993 50% 39. 3 11. 0 Hungary, Hungarian 1003 51% 38. 3 11. 8 Slovakia, Slovakian 1063 50% 38. 2 12. 1 802 37% 32. 2 11. 0 Netherlands, Dutch 1046 50% 40. 8 11. 4 France, French 1000 51% 39. 4 11. 9 Spain, Spanish 934 50% 37. 8 10. 5 Romania, Romanian 970 50% 37. 9 11. 5 Turkey, Turkish 914 43% 32. 5 9. 4 Italy, Italian 939 50% 39. 0 10. 6 Switzerland, French 303 51% 42. 5 9. 7 Switzerland, German 606 48% 43. 5 9. 4 50 56% 32. 9 8. 7 974 49% 39. 9 11. 3 13520 49% 38. 7 11. 4 Poland, Polish Switzerland, Italian Sweden, Swedish Total

The response category labeling effect Study 5: Results Relative endorsement of endpoints in survey data 0. 00 -0. 20 -0. 40 -0. 60 -0. 80 -1. 00 -1. 20 -1. 40 -8. 00 -7. 00 -6. 00 -5. 00 -4. 00 -3. 00 -2. 00 -1. 00 0. 00 Relative endpoint label familiarity (based on search engine data) Note: Standardized B =. 68, p <. 05, R² = 46%.

The response category labeling effect Study 5: Results (cont’d) § prior research has generally attributed differences in response distributions in cross-cultural comparisons to nationality and national culture; § our findings demonstrate that different labels may vary in terms of familiarity, which can lead to different response patterns across languages; § in particular, if the endpoint label used in a certain language is more familiar than the one used in another language (relative to the adjacent category label), it is likely that the endpoint will be selected more frequently in the former than in the latter language;

The response category labeling effect Discussion: Summary of findings § response category labels that are more commonly used in day-to -day language (i. e. , that are more familiar) lead to higher endorsement of their associated response categories; § respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the familiarity of the labels; § the category label familiarity effect can be eliminated by making respondents aware of the potentially biasing effect of label familiarity, § the problem may be particularly serious in cross-cultural research when different languages are used; § however, researchers can control for differences in label familiarity across languages based on secondary data;

The response category labeling effect Consequences of the response category label effect § if certain labels attract more responses, this leads to bias; § Baumgartner and Steenkamp (2001) discuss how extreme responding biases scale scores: if the modal scale response is above the midpoint, average scores will be inflated; □ if the modal scale response is below the midpoint, average scores will be deflated; □ § relationships between variables can also be biased;

The response category labeling effect Consequences of the response category label effect (cont’d) § imagine a situation in which the strength of a relationship is compared across two groups and labels that differ in familiarity are used to collect data in the two groups; § the DV, an attitudinal variable (ATT), is measured on an agreement rating scale, and the IV (e. g. , AGE in years) is measured on an objective scale and hence not affected by differences in label familiarity; § compared to respondents in the unfamiliar label condition, respondents in the familiar label condition who have a moderately positive or negative true attitude will exhibit a more extreme positive or negative observed attitude because they are more likely to endorse the endpoints; § this can result in a steeper observed slope and thus a stronger relationship between the objective antecedent and the observed attitude in the familiar label condition;

The response category labeling effect Consequences of the response category label effect (cont’d) § using data from Study 2: ATT: “I try to avoid food that is high in cholesterol” □ IV: Age in years □ I try to avoid food that is high in cholesterol 5 4 3 completely strongly 2 1 20 25 30 35 40 45 Age 50 55 60 65 70

The response category labeling effect Implications for multilingual survey research § translations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic; § optimizing equivalence: use response category labels that are equally familiar in different languages (rather than literal translations or words with equal intensity); e. g. , the German and Dutch labels “vollkommen einverstanden” and “volledig eens” are literal translations (similar to “completely agree”), but in German this expression is more familiar, resulting in more endpoint responses than in Dutch(based on Study 5); § back-translation of response category labels may not help because it may result in literal rather than idiomatic translations and the familiarity of the labels in different languages may differ;

The response category labeling effect Identifying appropriate endpoint labels in two languages