Automated Personality Classification A KARTELJ and V FILIPOVIC

  • Slides: 26
Download presentation
Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade,

Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia

Agenda Problem overview Classification of the existing solutions Presentation of the existing solutions Comparison

Agenda Problem overview Classification of the existing solutions Presentation of the existing solutions Comparison of the solutions Work in progress: Bayesian Structure Learning for the APC Future work: Video Based APC Conclusions 3. 10. 2012 MULTI 2012 2

Problem Overview 3. 10. 2012 MULTI 2012 3

Problem Overview 3. 10. 2012 MULTI 2012 3

The Big 5 Model 3. 10. 2012 MULTI 2012 4

The Big 5 Model 3. 10. 2012 MULTI 2012 4

The Steps in Our Research Survey paper (under review at ACM CSUR) 2. Research

The Steps in Our Research Survey paper (under review at ACM CSUR) 2. Research paper: A new APC model based on Bayesian structure learning (in progress) 3. Real-purpose application of the APC model from step 2 4. Go to step 3 1. 3. 10. 2012 MULTI 2012 5

Elements of APC Corpus: Essay, weblog, email, news group, Twitter counts. . . Personality

Elements of APC Corpus: Essay, weblog, email, news group, Twitter counts. . . Personality measurement: Questionnaire (internet and written). We are searching for an alternative! Model: Stylistic analysis, linguistic features, machine learning techniques 3. 10. 2012 MULTI 2012 6

Applications 3. 10. 2012 MULTI 2012 7

Applications 3. 10. 2012 MULTI 2012 7

Mining People’s Characteristics 3. 10. 2012 MULTI 2012 8

Mining People’s Characteristics 3. 10. 2012 MULTI 2012 8

Classification of Solutions • • C 1 criterion separates solutions by type of conversation

Classification of Solutions • • C 1 criterion separates solutions by type of conversation (1 = self-reflexive, N = continuous) C 2 criterion separates solutions by approach (TD = top-down, DD = data-driven, or HY = hybrid) 3. 10. 2012 MULTI 2012 9

Linguistic Styles: Language Use as an Individual Difference Pennebaker and King [1999] 3. 10.

Linguistic Styles: Language Use as an Individual Difference Pennebaker and King [1999] 3. 10. 2012 MULTI 2012 10

LIWC and MRC Features Feature Type Example Anger words LIWC Hate, kill Metaphysical issues

LIWC and MRC Features Feature Type Example Anger words LIWC Hate, kill Metaphysical issues LIWC God, heaven, coffin Physical state / function LIWC Ache, breast, sleep Inclusive words LIWC With, and, include Social processes LIWC Talk, us, friend Family members LIWC Mom, brother, cousin Past tense verbs LIWC Walked, were, had References to friends LIWC Pal, buddy, coworker Imagery of words MRC Low: future, peace – High: table, car Syllables per word MRC Low: a – High: uncompromisingly Concreteness MRC Low: patience, candor – High: ship 3. 10. 2012 MULTI 2012 11

What Are They Blogging About? Personality, Topic and Motivation in Blogs Gill et al.

What Are They Blogging About? Personality, Topic and Motivation in Blogs Gill et al. [2009] 3. 10. 2012 MULTI 2012 12

Taking Care of the Linguistic Features of Extraversion Gill and Oberlander [2002] 3. 10.

Taking Care of the Linguistic Features of Extraversion Gill and Oberlander [2002] 3. 10. 2012 MULTI 2012 13

Personality Based Latent Friendship Mining Wang et al. [2009] 3. 10. 2012 MULTI 2012

Personality Based Latent Friendship Mining Wang et al. [2009] 3. 10. 2012 MULTI 2012 14

A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et

A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et al. [2011] 3. 10. 2012 MULTI 2012 15

Predicting Personality with Social Media Golbeck et al. [2011] 3. 10. 2012 MULTI 2012

Predicting Personality with Social Media Golbeck et al. [2011] 3. 10. 2012 MULTI 2012 16

Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Quercia et al. [2011] 3.

Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Quercia et al. [2011] 3. 10. 2012 MULTI 2012 17

Paper Input Corpus Features Algorithm Soft. [Pennebaker and King 1999] text essays LIWC correlations

Paper Input Corpus Features Algorithm Soft. [Pennebaker and King 1999] text essays LIWC correlations [Mairesse et al. 2007] text, speechessays LIWC, MRC [Gill et al. 2009] text weblogs (14. 8 words) LIWC [Yarkoni 2010] text I S A R n/a 455 H H HM C 4. 5, NB, SMO, M 5’ Weka 99 M M HM linear regression n/a 26 H H M M weblogs (100 K words) LIWC correlations n/a 21 H M M M [Gill and Oberlander 2002] text emails (105 students) bigrams bigram analysis n/a 49 L M M L [Nowson et al. 2005] text weblogs (410 K words) word list correlations n/a 48 L H H L [Oberlander 2006] text weblogs (410 K words) N-grams NB, SMO Weka 53 H M HM [Wang et al. 2009] text, weblogs (200 pairs) lexical freq. , TFIDF [Iacobelli et al. 2011] text weblogs (3000) [Argamon et al. 2005] text [Argamon et al. 2007] text Minitab 1 H M MM LIWC, bigrams, SVM, SMO, NB. . Weka 1 H H MH essays word list, conj. SMO Weka 38 H M M M essays word list, conj. SMO Weka, ATMan 45 H M M M LIWC, MRC, utterance… Rank. Boost n/a 22 M M H M LIWC C 4. 5 Weka, SPSS 30 M H M L [Mairesse and Walker 2006] text , conv. 96 persons extracts (≈100 Kwords) mail. lists (140 K [Rigby and Hassan 2007] text emails) logistic regression Cit. [Roshchina et al. 2011] text Trip. Advisor reviews LIWC, MRC Linear, M 5, SVM Weka 2 H M [Quercia et al. 2011] meta 335 Twitter users M 5’ rules Weka 5 M H MM Weka 12 H M M M [Golbeck et al. 2011] text, meta 279 FB users Twitter counts M 5’ rules, Gaussian 3. 10. 2012 processes MULTI 2012 5 classes (161 in total) LM 18

Naive Bayes Classifier 3. 10. 2012 MULTI 2012 19

Naive Bayes Classifier 3. 10. 2012 MULTI 2012 19

Naive Bayes and Bayesian Network 3. 10. 2012 MULTI 2012 20

Naive Bayes and Bayesian Network 3. 10. 2012 MULTI 2012 20

Bayesian Network for the APC 3. 10. 2012 MULTI 2012 21

Bayesian Network for the APC 3. 10. 2012 MULTI 2012 21

Bayesian Network Structure Learning Obtain corpus (training set T) 2. Fit T to appropriate

Bayesian Network Structure Learning Obtain corpus (training set T) 2. Fit T to appropriate network structure by: 1. a) ILP formulation + solver (CPLEX, Gurobi…) on smaller instances b) Apply metaheuristic on larger instances Validate quality of metaheuristic approach 4. Compare obtained APC accuracy with other approaches 3. 10. 2012 MULTI 2012 22

Other Ideas Games with a purpose (GWAP) Clustering personality characteristics 3. 10. 2012 MULTI

Other Ideas Games with a purpose (GWAP) Clustering personality characteristics 3. 10. 2012 MULTI 2012 23

Packing everything together: Video Based APC 3. 10. 2012 MULTI 2012 24

Packing everything together: Video Based APC 3. 10. 2012 MULTI 2012 24

Conclusions Classification of the existing solutions (Survey paper) Filling the gaps inside classification tree

Conclusions Classification of the existing solutions (Survey paper) Filling the gaps inside classification tree Introducing Bayesian Structure Learning for the APC Utilizing metaheuristics in dealing with high dimensionality APC potential: social networks, recommender, and expert systems 3. 10. 2012 MULTI 2012 25

THANK YOU. Aleksandar Kartelj kartelj@matf. bg. ac. rs Vladimir Filipovic vladaf@matf. bg. ac. rs

THANK YOU. Aleksandar Kartelj kartelj@matf. bg. ac. rs Vladimir Filipovic vladaf@matf. bg. ac. rs Veljko Milutinovic vm@etf. bg. ac. rs