Progress from the OSF Evaluation Instrument Workgroup on

Content • Brief introduction • Method – determine model • Results • What is

Literature study – Most Used Questionnaires Questionnaire Relational Agent Group The Working Alliance Inventory

Problems 1. Difficult to compare the results of different user studies because of seemingly

• “A validated standardized questionnaire instrument for evaluating human interaction with Artificial Social

What are the various aspects and dimensions to capture the ASA quality?

Relationship between successive steps that influence and are influenced by the human-ASA interaction. External

Interaction with artificial social agent Human’s Attributes to Support Interaction External Entities Human Interaction

• Two Qualtrics surveys • Study 1: Classifying 189 constructs into 10 categories

n n Majority analysis • Single category majority ( >= 50% of the raters)

External Entities Human’s Impressions Left by Interaction Human’s Attributes to Support Interaction Agent Interaction

[External] Human Interaction Human’s Impressions Left by Interaction (Human-Agent) Interaction Quality Human’s Attributes to

[External] Human Interaction HImpr Int. Q Hattr Agent AProp ASoc AImpr ARole

3 Interaction Int. Q 12 1 3 Aprop 4 3 1 24 [External] 11

Majority single category two category single+two categories constructs 89 99 188 coverage 47% 52%

Automatic text clustering of questionnaire items

The Route 1. Determine the process, and gets people involved 2. Determine the model

Join our research at: Open Science Framework “Evaluation Instrument Workgroup“ http: //osf. io/6 duf

Slides: 24

Download presentation

Progress from the OSF “Evaluation Instrument Workgroup” on the Artificial Social Agent Measurement Instrument

Content • Brief introduction • Method – determine model • Results • What is next

Acknowledgement Siska Fitrianie, Ph. D

Literature study – Most Used Questionnaires Questionnaire Relational Agent Group The Working Alliance Inventory (Horvath and Greenberg, 1989) Social Presence (Harms and Biocca, 2004) Frequency 14 6 5 Presence Scale (Nowak and Biocca, 2003) Presence (Witmer and Singer, 1998) The Godspeed Questionnaire (Bartneck et al. , 2009) Warmth, Competence and Human-Like (Bergmann et al. , 2012) 4 4 3 3 81 IVA paper 2013 -2018, total constructs = 189

Problems 1. Difficult to compare the results of different user studies because of seemingly differences in constructs being measured, or different questionnaire being used 2. Difficult to make normative statement about the perceived “quality” of an ASA 3. Difficult to replicate a study when needing to create an ASA with same “quality” level 4. Difficult to make a useful statement about the effect of ASA without insight into the ”quality” of the ASA 5. Much redundant research work in developing one-off questionnaire for each study 6. Reliability and validity of one-off questionnaires is questionable 7. Questionnaire designed from other domains do not always fit well in ASA domain

• “A validated standardized questionnaire instrument for evaluating human interaction with Artificial Social Agents” • Purpose of the Instrument: • Ability to make a statement about various aspects and dimensions expected to be relevant to capture the ASA quality • Ability to make a “standardized” statement about “quality” of ASAs • Grounding the quality values in ASA examples Goal

What are the various aspects and dimensions to capture the ASA quality?

Relationship between successive steps that influence and are influenced by the human-ASA interaction. External Entities Interaction with Artificial Social Agents Process Outcome Context Dependent education Bonding physical exercises Weight loss

Interaction with artificial social agent Human’s Attributes to Support Interaction External Entities Human Interaction Process Agent’s Basic Properties Human’s Impressions Left by Interaction (Human-Agent) Interaction Quality Agent’s Social Traits Agent’s Impressions Left by Interaction Agent’s Role Performance Outcome Context Dependent

• Two Qualtrics surveys • Study 1: Classifying 189 constructs into 10 categories in the Model • Study 2: Identifying new categories • Participants: members of the OSF “Evaluation Instrument Workgroup” • 11 -17 rates per construct (median = 13) Expert Review

Correlation matrix

n n Majority analysis • Single category majority ( >= 50% of the raters) • Two category majority coalition

External Entities Human’s Impressions Left by Interaction Human’s Attributes to Support Interaction Agent Interaction (Human-Agent) Interaction Quality Process Out come Context Dependent Agent’s Basic Properties Agent’s Social Traits Agent’s Role Performance Agent’s Impressions Left by Interaction

[External] Human Interaction Human’s Impressions Left by Interaction (Human-Agent) Interaction Quality Human’s Attributes to Support Interaction Agent’s Basic Properties Agent’s Social Traits Agent’s Role Performance Agent’s Impressions Left by Interaction

[External] Human Interaction HImpr Int. Q Hattr Agent AProp ASoc AImpr ARole

3 Interaction Int. Q 12 1 3 Aprop 4 3 1 24 [External] 11 7 Himpr 23 11 7 1 Hattr 2 2 4 4 5 Agent ASoc 27 Human 13 3 ARole 9 7 AImpr 1

Majority single category two category single+two categories constructs 89 99 188 coverage 47% 52% 99% 3 Interaction Int. Q 12 1 3 3 Aprop 4 3 Agent 11 1 24 ASoc 27 Human 7 Himpr 23 13 [External] 11 2 4 4 1 5 7 ARole 9 Hattr 2 7 AImpr 1

What is next

Automatic text clustering of questionnaire items

Card sorting

The Route 1. Determine the process, and gets people involved 2. Determine the model 1. Examine existing questionnaires 2. Discussion among experts 3. Determine the constructs and dimensions 1. Face validity among experts 2. Grouping of existing constructs 4. Determine initial set of construct items 1. Content validity analysis – study into expert’s agreement of items to measure constructs 2. Reformulating into easy to understand item questions 5. Confirmatory factor analysis to examine construct validity 6. Establish final items set, creating long and short questionnaire version 7. Criteria validity 1. Predictive validity: agreement with future observation 2. Concurrent validity agreement with other ‘valid’ measure collected at same time 8. Translate questionnaire (forward/backward translation) 9. Developing normative data set

Join our research at: Open Science Framework “Evaluation Instrument Workgroup“ http: //osf. io/6 duf 7/