Light SIDE Tutorial Carolyn Penstein Ros Language Technologies
- Slides: 141
Light. SIDE Tutorial Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Introduction
What is machine learning? n Automatically or semi-automatically ¨ Inducing rules from data ¨ Making predictions Data Learning Algorithm Model New Data Classification Engine Prediction
http: //lightsidelabs. com/research
Automatic Analysis Of Conversation
6
7
Effective data representations make problems learnable… n n n Machine learning isn’t magic But it can be useful for identifying meaningful patterns in your data when used properly Proper use requires insight into your data ?
Sou. FLé Framework (Howley et al. , 2013) What properties of discourse are important for learning discussions?
Sou. FLé Framework (Howley et al. , 2013) What properties of discourse are important for learning discussions? Person
Sou. FLé Framework (Howley et al. , 2013) Transactive Knowledge Integration Person
Sou. FLé Framework (Howley et al. , 2013) Transactive Knowledge Integration Engagement Person Engagement
Sou. FLé Framework (Howley et al. , 2013) Authority Transactive Knowledge Integration Engagement Person Authority Engagement
i • Definition of Transactivity • building on an idea expressed earlier in a conversation • using a reasoning statement I think the tube will get heavier because water is going in That’s true, but the important point is that water can flow in, but starch can’t flow out. 15
Transactivity (Berkowitz & Gibbs, 1983) n n Findings ¨ Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005; Kruger & Tomasello, 1986; Teasley, 1995) ¨ Moderating effect on knowledge sharing in working groups (Gweon et al. , 2011) Computational Work ¨ Can be automatically detected in: n Threaded group discussions (Kappa. 69) (Rosé et al. , 2008) n Transcribed classroom discussions (Kappa. 69) (Ai et al. , 2010) n Speech from dyadic discussions (R =. 37) (Gweon et al. , 2012) ¨ Predictable from a measure of speech style accommodation computed by an unsupervised Dynamic Bayesian Network (Jain et al. , 2012) 16
Identifying Transactivity in Threaded Discussions n AUTHOR: Hans Michael blames his poor achievements on a lack of giftedness in mathematics. ---------------From this one can conclude that his attribution is internal and stable. Internal because it comes from within himself. And stable because it is something that can't be changed. AUTHOR: Gerry >Michael blames his poor achievements on a lack of giftedness in mathematics. From… ------------Wow, that was a really good work. Right on! ------------From the case I could not however directly conclude that Michael thinks the task is too difficult for him. Instead I thought Michael thinks that he is too dumb for mathematics. -------------Therefore, I did not include something about that in my contribution. Social modes of coconstruction (Weinberger & Fischer, 2006) ¨ n To what degree or in what ways learners refer to the contributions of their learning partners Tag. Helper tools achieves reliability of. 69 Kappa (Rosé et al. , 2008)
Thread Structure Features n 2 AUTHOR: Hans Michael blames his poor achievements on a lack of giftedness in mathematics. ---------------From this one can conclude that his attribution is internal and stable. Internal because it comes from within himself. And stable because it is something that can't be changed. Thread structure features depth (numeric): the depth in the thread where a message appears ¨ parent_child_similarity (numeric): semantic similarity (cosine similarity) between the current message segment to all its parent message segments. The highest value is chosen ¨ AUTHOR: Gerry >Michael blames his poor achievements on a lack of giftedness in mathematics. From… ------------Wow, that was a really good work. Right on! ------------From the case I could not however directly conclude that Michael thinks the task is too difficult for him. Instead I thought Michael thinks that he is too dumb for mathematics. -------------Therefore, I did not include something about that in my contribution.
Evaluating Context-Based Features
Effective data representations make problems learnable… ! r be Re m Know your data!! ?
Essential Reading n Witten, I. H. , Frank, E. , Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques, third edition, Elsevier: San Francisco
Automated Discourse Analysis n n n Howley, I. , Mayfield, E. & Rosé, C. P. (2013). Linguistic Analysis Methods for Studying Small Groups, in Cindy Hmelo-Silver, Angela O’Donnell, Carol Chan, & Clark Chin (Eds. ) International Handbook of Collaborative Learning, Taylor and Francis, Inc. Rosé, C. P. , Wang, Y. C. , Cui, Y. , Arguello, J. , Stegmann, K. , Weinberger, A. , Fischer, F. , (2008). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning, submitted to the International Journal of Computer Supported Collaborative Learning 3(3), pp 237 -271. Mu, J. , Stegmann, K. , Mayfield, E. , Rosé, C. P. , Fischer, F. (2012). The ACODEA Framework: Developing Segmentation and Classification Schemes or Fully Automatic Analysis of Online Discussions. International Journal of Computer Supported Collaborative Learning 7(2), pp 285 -305. Gweon, G. , Jain, M. , Mc Donough, J. , Raj, B. , Rosé, C. P. (2013). Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation, International Journal of Computer Supported Collaborative Learning 8(2), pp 245 -265.
Applications to Learning Sciences Research n n n Howley, I. , Kumar, R. , Mayfield, E. , Dyke, G. , & Rosé, C. P. (2013). Gaining Insights from Sociolinguistic Style Analysis for Redesign of Conversational Agent Based Support for Collaborative Learning, in Suthers, D. , Lund, K. , Rosé, C. P. , Teplovs, C. , Law, N. (Eds. ). Productive Multivocality in the Analysis of Group Interactions, edited volume, Springer. Howley, I. , Mayfield, E. , Rosé, C. P. , & Strijbos, J. W. (2013). A Multivocal Process Analysis of Social Positioning in Study Group Interactions, in Suthers, D. , Lund, K. , Rosé, C. P. , Teplovs, C. , Law, N. (Eds. ). Productive Multivocality in the Analysis of Group Interactions, edited volume, Springer. Adamson, D. , Dyke, G. , Jang, H. J. , Rosé, C. P. (2014). Towards an Agile Approach to Adapting Dynamic Collaboration Support to Student Needs, International Journal of AI in Education 24(1), pp 91121.
Text Teaser
Consider this simple example… Look for what distinguishes Questions and Statements in this dataset. What clues do you see?
What are good features for text categorization? What distinguishes Questions and Statements? Not all questions end in a question mark.
What are good features for text categorization? What distinguishes Questions and Statements? I versus you is not a reliable predictor
What are good features for text categorization? What distinguishes Questions and Statements? Not all WH words occur in questions
Light. SIDE: A quick tour
Basic Text Feature Extraction
Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach Cheese Cows Eat Hamsters Make Seeds n n Cows make cheese. 110010 Hamsters eat seeds. 001101
Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach But same representation for “Cheese makes cows. ”! Cheese Cows Eat Hamsters Make Seeds n. Cows make cheese. n 110010 n. Hamsters n 001101 eat seeds.
42
Examples from Gallup Poll Data n Male from Virginia, age 30, negative: “I think it’ll increase costs for everyone. ” n Female from Illinois, unknown age, positive: “Because the cost of healthcare is just outta sight crazy” n Male from Michigan, age 70, positive: “the cost”
The Gallup Poll Dataset 44
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “the cost of healthcare” DT NN PRP NN
Part of Speech Tagging http: //www. comp. leeds. ac. uk/ccalas/tagsets/upenn. html 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular 15. NNPS Proper noun, plural 16. PDT Predeterminer 17. POS Possessive ending 18. PRP Personal pronoun 19. PP Possessive pronoun 20. RB Adverb 21. RBR Adverb, comparative 22. RBS Adverb, superlative
Part of Speech Tagging http: //www. comp. leeds. ac. uk/ccalas/tagsets/upenn. html 23. RP Particle 24. SYM Symbol 25. TO to 26. UH Interjection 27. VB Verb, base form 28. VBD Verb, past tense 29. VBG Verb, gerund/present participle 30. VBN Verb, past participle 31. VBP Verb, non-3 rd ps. sing. present 32. VBZ Verb, 3 rd ps. sing. present 33. WDT wh-determiner 34. WP wh-pronoun 35. WP Possessive whpronoun 36. WRB wh-adverb
Basic Types of Features “the cost of healthcare” DT NN PRP NN
Basic Types of Features “the cost of healthcare” 4
Basic Types of Features “the cost of healthcare” YES
Basic Types of Features “the cost is too great. The cost is immense!” The value of the feature is the number of times it occurs, rather than 1 if it occurs or 0 otherwise, which is the default.
Basic Types of Features “the cost is too great. The cost is immense!” If you uncheck this, punctuation will be ignored and stripped out of the representation.
Basic Types of Features X X “the cost of healthcare”
Basic Types of Features “healthcare costs” “healthcare cost”
Clarification on Basic text feature extractor POS tagging happens before stemming or stopword removal n POS bigrams are not affected by stopword removal – POS tags for stopwords will still be included n On word n-grams, the only n-grams that will be dropped in the case of stopword removal are ones that consist only of stopwords n
Feature Space Customizations n Feature Space Design ¨ Think like a computer! ¨ Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful ¨ Look for approximations If you want to find questions, you don’t need to do a complete syntactic analysis n Look for question marks n Look for wh-terms that occur immediately before an auxilliary verb n
Effective Development and Evaluation Process in Light. SIDE
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Perfect on training data
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes Performance on Not perfect on training testing data? data
If Outlook = sunny, no else if Outlook = overcast, yes else if Outlook = rainy and Windy = TRUE, no else yes IMPORTANT! If you evaluate the performance of your rule on the same data you trained on, you won’t get an accurate estimate of how well it will do on new data.
Simple Cross Validation Fold: 1 n TEST 1 TRAIN 2 TRAIN 3 TRAIN 4 TRAIN 5 TRAIN 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 2, 3, 4, 5, 6, 7 n and apply trained model to 1 n The results is Accuracy 1 n
Simple Cross Validation Fold: 2 n TRAIN 1 TEST 2 TRAIN 3 TRAIN 4 TRAIN 5 TRAIN 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 3, 4, 5, 6, 7 n and apply trained model to 2 n The results is Accuracy 2 n
Simple Cross Validation Fold: 3 n TRAIN 1 TRAIN 2 TEST 3 TRAIN 4 TRAIN 5 TRAIN 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 2, 4, 5, 6, 7 n and apply trained model to 3 n The results is Accuracy 3 n
Simple Cross Validation Fold: 4 n TRAIN 1 TRAIN 2 TRAIN 3 TEST 4 TRAIN 5 TRAIN 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 2, 3, 5, 6, 7 n and apply trained model to 4 n The results is Accuracy 4 n
Simple Cross Validation Fold: 5 n TRAIN 1 TRAIN 2 TRAIN 3 TRAIN 4 TEST 5 TRAIN 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 2, 3, 4, 6, 7 n and apply trained model to 5 n The results is Accuracy 5 n
Simple Cross Validation Fold: 6 n TRAIN 1 TRAIN 2 TRAIN 3 TRAIN 4 TRAIN 5 TEST 6 TRAIN 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 2, 3, 4, 5, 7 n and apply trained model to 6 n The results is Accuracy 6 n
Simple Cross Validation Fold: 7 n TRAIN 1 TRAIN 2 TRAIN 3 TRAIN 4 TRAIN 5 TRAIN 6 TEST 7 Let’s say your data has attributes A, B, and C You want to train a rule to predict D n First train on 1, 2, 3, 4, 5, 6 n and apply trained model to 7 n The results is Accuracy 7 n Finally: Average Accuracy 1 through Accuracy 7 n
Avoiding Overfitting! Separate data for evaluation from data for exploration n We will refer to the exploration set as the Dev Set n We will refer to the evaluation set as the cross-validation set n You should also have a final test set you never look at until you think you are done! n
74
Remember!!!! n Use your development data for: ¨ Qualitative analysis before ML ¨ Error analysis ¨ Ideas for design of new features n Use your cross validation data for: ¨ Evaluating n your performance Never include the data you are testing on in the data you do feature selection with!!!
Evaluation
Evaluation
Evaluation
Evaluation
Why is performance different? Men and women used language differently n Different focus n ¨ Women had a more personal focus ¨ Men had a more national/objective focus
Special Text Features
Stretchy Patterns in Light. SIDE Looking at sentiment_sentences. csv 82
Configuring Stretchy Patterns Longer patterns and longer gaps lead to larger numbers of features n Categories are useful both for abstraction and for anchoring the patterns n
84
Regular Expressions 89
American Street Gangs Predict gang affiliation from posts • • • Crips, Bloods, Hoovers o crips started in South Central LA o Pirus, Bloods, Hoovers from crips Chicago based o People Nation § vice lords, latin kings, stones o Folk nation § gangster disciples Trinitarios o hispanic gang based in NYC
Graffiti Based Style Features Graffiti Social messages Stylistic writing crossing out other gangs On the board c ck p h b e s c ck ckrab, ckome cc fucc, blocc pk pkut, . . . hk whky, hkappens bk bk 1, bkang 3 3 ast 5 5 hit c^ c^rime, c^uh
Character N-grams n Character bigrams can detect graffiti style features n Could also be used to identify consistent endings on words (i. e. , that indicate formality or gender)
Parse Features Word based features lose all structure and order within sentences n Parse features can capture that n But they are SLOW!! n
Error Analysis
Error Analysis Process High Level Overview n n Identify large error cells Make comparisons ¨ Ask Goal: We want to discover how to rerepresent the data so that instances with the same class value look more similar to one another and instances with different class values look more different yourself how it is similar to the instances that were correctly classified with the same class (vertical comparison) ¨ How it is different from those it was incorrectly not classified as (horizontal comparison)
96
97
98
99
100
101
* Testing bigrams as an alternative….
113
114
115
116
117
Heterogeneous Datasets
Datasets Three datasets for age prediction: n Blogs from blogger. com 2500 frequency (targeted crawl by Schler et al. , 2006; 9, 600 training docs @13 K tokens) n Fisher corpus of telephone conversation transcripts (Cieri et al. , 2004; 5, 957 training docs @3 K tokens) n Online forum for breast cancer patients, breastcancer. org (2, 330 training docs @23 K tokens) 0 10 age 90 Age distributions in datasets Datasets divided into training, development and test set
Feature Splitting (Daumé III, 2007) General Domain A Domain B Why is this nonlinear? It represents the interaction between each feature and the Domain variable Now that the feature space represents the nonlinearity, the algorithm to train the weights can be linear.
Leveraging Subpopulations through Multi-Level Modeling
Gang Alliances
Gangs Data
126
Feature Analysis n n Style features that distinguish Allied from Opposing differ by dominant gang Crips: When the dominant Allied: b. Caret gang is in an allied ¨ Opposing: CC, PK, c. Caret ¨ n Bloods: Allied: XO, CC ¨ Opposing: h. Caret, BK ¨ n Latin Kings: Allied: CC, XO ¨ Opposing: 5 S ¨ thread, we see style features that unite them against opposing gangs.
Feature Analysis n n Style features that distinguish Allied from Opposing differ by dominant gang Crips: Allied: b. Caret When the dominant ¨ Opposing: CC, PK, c. Caret gang is in an ¨ n Bloods: Allied: XO, CC ¨ Opposing: h. Caret, BK ¨ n Latin Kings: Allied: CC, XO ¨ Opposing: 5 S ¨ opposing thread, we also see features that unite the opposing gangs against them.
Feature Analysis n n Style features that distinguish Allied from Opposing differ by dominant gang Crips: Allied: b. Caret When the dominant ¨ Opposing: CC, PK, c. Caret gang is in an ¨ n Bloods: Allied: XO, CC ¨ Opposing: h. Caret, BK ¨ n Latin Kings: Allied: CC, XO ¨ Opposing: 5 S ¨ opposing thread, we also see features that unite the opposing gangs against them.
Feature Analysis n n Unigram features that distinguish Allied from Opposing don’t differ by dominant gang as much as style features Universal: We see ¨ Allied: lmao, you, crew relationship ¨ Opposing: forever, wtf, where words, but not gang identity n Crips: words. Allied: lol ¨ Opposing: know, about ¨ n Bloods: Allied: niggas, the ¨ Opposing: at ¨
Subpopulations and Overfitting Example from gender prediction in blog data…
What is different in how men and women talk?
What is different in how men and women talk?
What is different in how men and women talk?
Confounded with other variables n Men sound older and women sound younger (Argamon et al. , 2007) n Men sound more like non-fiction and women sound more like fiction (Argamon et al. , 2003)
Why do low level features overfit? n In a linear model, positive weights push the decision towards one class while negative weights push the decision towards the other class n The magnitude of the weight indicates how much of a push that feature gives
Why do low level features overfit? n What happens if the same feature predicts age, gender, and social class? ¨ If you are predicting gender, then the average value for each feature assumes the mix of age and social class in the data set you trained for n n ¨ So The weights normalize for this mix If the mix changes, then the normalization will be wrong the weights won’t predict gender correctly anymore on datasets where the mix of those other factors is different
Never saw MOH in train, so trained model will overpredict extent of swearing among males on test set Train MYL FYH MYL MOL MYL FYH FYH FOH MYH FOH MYL MOL FYL MOH FYL FOH MOH FOL MOL FOL MYH MOH FYL FOH FOL FYL MOH FOL FYL MYH MOL MYH FYH MOL MYH FOH FYH MOH FOH MOL MYH MYL MOL MOH FYL MOL MYH Test MYH FYL MOL MYH MOH
Evaluation of Domain Generality • • • Contrast random CV and leave-oneoccupation-out CV All feature space representations show significant drop between random CV and leave-oneoccupation-out CV Only stretchy patterns remain significantly above random performance
- Side side side similarity theorem
- Similar picture
- Sss similarity theorem
- Triangle similarity theorems
- Light light light chapter 23
- Light light light chapter 22
- Light light light chapter 22
- Server-side technologies
- Raneem qaddoura
- Perfect competition side by side graphs
- Uil side by side
- Sell side vs buy side
- Soda lime uses
- Side angle side theorem
- Two wheels roll side by side
- Double v groove weld symbol
- Draw the projection of a regular hexagon of 25mm side
- Hanau quint formula
- Platzbedarf side by side melkstand
- Side by side stuff
- Tan geometry formula
- What is server side programming
- Movement of mandible muscles
- Videocon side by side refrigerator
- Red side blue side
- Carolyn mendiola
- Carolyn boroden net worth
- How to tell wild animals by carolyn wells
- Carolyn sourek
- Carolyn talbot
- Carolyn johnston md
- Carolyn shread
- Carolyn graham jazz chants
- Carolyn hanesworth
- Levi carolyn ph
- Carolyn marano
- Carolyn brownawell
- Carolyn maull
- Yvonne has 10 tulip bulbs in a bag
- Carolyn ells
- Carolyn hotchkiss
- Carolyn laorno
- Carolyn washburn
- Carolyn cherry
- Carolyn knoepfler
- Carolyn saxby facts
- Duns and ros correlation
- Ros lidar mapping
- Catkin build ros
- Universal robot ros driver
- Rosbusinessconsulting
- Designing spaces for effective learning
- Ros history taking
- Ros crash course
- How ros works
- Diketahui ayam berpial rose atau mawar
- Ros global planner
- Rosrun tf tree
- Ros lecture
- Pityriasis ros
- Ros adn
- Hpi example
- Włoski fizyk pionier radiotechniki
- Ricardo ros
- Physical development in late adulthood
- Mediators of inflammation
- Ros publish static transform command line
- Rebeca ros
- újrabefektetési ráta
- Ros object recognition
- Tamara ros
- Ros arduino bridge
- Fontana ros
- Ros foster
- Ros hokuyo
- Sw ros
- Ríos de la vertiente del atlántico
- Ros roi
- Ros lecture
- Características de los ros
- Aevila
- Put out the light and then put out the light
- Membrane bound organelles
- Bouncing off of light
- Materials that block the light
- What are the types of scripting language
- Preschoolers seem to use illogical reasoning
- British side english language school
- Unified modeling language tutorial
- Lc3 assembly language
- What is vrml
- Mips code example
- Geography markup language tutorial
- Lc3 tutorial
- Netconf/yang tutorial
- You just gotta ignite the light figurative language
- Rob ratterman
- Emergent game technologies
- Krisent technologies
- Mobilean technologies
- Leased line wan technology
- Wan technologies
- Music performance study design
- Tesca global
- Stiesdal electrolysis
- Langenburg technologies
- Spine technologies india pvt ltd
- Sirena technologies
- Genex turbine technologies
- Shavlik technologies
- Savex.in
- Qhr technologies inc
- Hydrogen production from biomass ppt
- Orm technologies
- Sisoft technologies
- Internet technologies and applications
- Abb network management
- Nec technologies noida address
- Storage and retrieval technologies in multimedia
- Case studies of typical holistic technologies
- Mcl technologies
- Kitty hawk technologies
- Lyceum of information technologies
- Lucent technologies chip
- Lofar technologies
- Wan transport technologies
- Infineon technologies dresden
- Indus auto technologies
- Improvement of software economics
- Aerohub system
- Fuze technologies
- Gsn technologies
- Global med technologies
- Global health technologies coalition
- Gaia technologies plc
- Fst flame spray technologies
- Erp and related technologies
- Ecommerce backend architecture
- Dss systems & software technologies ltd
- Dependable technologies
- Marcus radue skyline technologies site: linkedin.com
- What are the trends in media and information