Network Analytics meets Text Mining for Social Media
- Slides: 41
Network Analytics meets Text Mining for Social Media Analysis Dr. Rosaria Silipo
Social Media Data Water Everywhere, and not a drop to drink 2
Social Media Data Water Everywhere, and not a drop to drink What companies do with it: • • • Download and keep Topic [Shift] Detection (email content routing, detect market interest shift, clinical studies, query non structured DBs, . . . ) Sentiment Analysis (marketing, polls, elections, . . . ) Connection Analysis (influencers, risk analysis, . . . ). . 3
Social Media Data Water Everywhere, and not a drop to drink The Analysis Tools: • • • Web Crawlers Visual Exploration Topic Detection (Text Mining, NLP, Ontologies) Sentiment Score (Text Mining, NLP) Influence Score (Network Analytics) Find Groups (Predictive Analytics) 4
Case Study Example: Slashdot Data Post Basic Numbers: • 24532 users • 491 threads with • 15 – 843 responses • 12 – 507 users • 113505 posts Comments • 60 main topics • Selected Topic: Politics 5
Case Study Example: Slashdot • Very rich data sources about customers ! • We want to establish: Sentimen t Analysi s • How users feel about the discussed topic • Whether it matters how users feel Network Analytics • A more general abstraction of the results Clustering 6
Sentiment Analysis Remove anonymous users, group by Post. ID Words Tagging MPQA Corpus Positive words Bo. W, Entity Filter, Word Frequency, Attitude Calculation by Document Negative words Total Attitude by User Bins Word cloud for selected users
Slashdot – Text Mining Most Negative User p. Nutz
Slashdot – Text Mining Most Positive User dada 21
Slashdot – Sentiment Analysis • 16016 positive users • 7107 negative users • Most positive user: dada 21 (2838 positive/1725 negative words) • Most negative user: p. Nutz (43 positive/109 negative words) • Which Topics have positive users in common ? – – – Government People Law/s Money Market Parties
Network Creation User 1 User 2 User 3 User 4 User 5 User 6 11
Topic Graphs 12
Topic Graph: NASA 14
Topic Graph: Sci-Fi 15
Hubs & Authorities • Hubs = Followers • Authorities = Leaders Filtering anonymous users and creating network Users with hub and authority weights and other features Centrality index to define hub weight and authority weight 16
Hubs & Authorities dada 21 Carl Bialik from the WSJ Tube Steak Doc Ruby p. Nutz 99 Bottles. Of. Beer. In. My. F 17
KNIME: Bringing it all together Users with hub and authority weights and other features Network Analysis Text Analysis Users bins: positive, negative, neutral 18
dada 21 Carl Bialik from the WSJ Web. Hosting Guy Catbeller Tube Steak Doc Ruby 99 Bottles. Of. Beer. In. My. F p. Nutz 19
What we have found. . . - The The positive leaders neutral leaders negative leaders inactive users What identifies each group? How do I identify a new user? How do I handle each user? 20
Why Clustering? - No a priori knowledge (not even on a subset of users) - Prediction and interpretation capabilities required k-Means algorithm 21
Re-sampling the Training Set k = 10 23
The k-Means Clusters 24
The k-Means Clusters Superfans Neutral users Fans Negative users 25
Additional Discoveries • • • There are only very few real leaders! Authority and hub scores identify active participants rather than leaders. Superfans can be found in cluster_3 Negative and (sigh!) active users are collected in cluster_1. Neutral users are usually inactive (cluster_2, cluster_7, and cluster_8) Positive users with different degrees of activity are scattered across the remaining clusters. 26
The operational Workflow Pre-processing Cluster Extraction Assignment of new data 27
Notes • MPQA Corpus: publicly available Subjectivity Lexicon (http: //www. cs. pitt. edu/mpqa/lexicons. html) • User Characterization is Sum -> Mean • NLP: No sentence splitting, no negation identification. • For a more refined syntax-based sentiment analysis -> „External Tool“ node 28
External Tool Node The „External Tool“ node executes any external program from command line 1. Writes input data to an input file 2. Calls Tool to run on input file and command line options and to write results to output file 3. Reads output file and presents data at output port 29
Alternative Sentiment Analysis Free non-interactive Command Line running Tools for Sentiment Analysis not found Senti. Strength v 2. 2 (still interactive) External Tool and Generic Web Service Client 30
Community Web Crawler Node Web Crawling Workflow XML Parsing Nodes 31
Next Steps - Integrate topic information - Integrate user demographic and behavioural information - Discover [time series] patterns for early detection of negative users and superfans - Try other techniques, maybe even on manually segmented data, to discover new user segments 32
Where do I find more? Whitepaper: rosariasilipo@yahoo. com Complete Workflows + Data: www. knime. com - text mining - network mining - combined analysis (note the above 3 process huge data and require 16 G memory) – clustering Open Source Software: KNIME www. knime. com 33
Next Appointment User Day US Boston (free) October 22 nd 2013 10: 00 -17: 00 Microsoft New England R&D Center (NERD) One Memorial Drive, Suite 100, Cambridge http: //www. knime. com/user-day-boston-2013 34
Hands-on Session 1. Download KNIME from www. knime. com 35
Hands-on Session 2. Install Extensions Help -> Install New Software Select: • KNIME & Extensions In KNIME Labs Extensions, select: • KNIME Network Mining • KNIME Textprocessing 36
Hands-on Session 3. Get workflows and Slashdot data • Get workflows from USB stick (KNIMEBoston 2013. zip) • • Text Mining Network Analytics Text and Network Mining Social Media Clustering • Slashdot Raw Data is included in the downloaded workflows • A smaller set of data is available, Slashdot Reduced Data, for lower memory requirements • Both data sets are available from USB Stick 37
Hands-on Session 3. Import Workflows 38
Hands-on Session Memory Increase in knime. ini -startup plugins/org. eclipse. equinox. launcher_1. 2. 0. v 20110502. jar --launcher. library plugins/org. eclipse. equinox. launcher. win 32. x 86_64_1. 1. 100. v 20110502 -vmargs -Xmx 2 G -XX: Max. Perm. Size=256 m -server -Dsun. java 2 d. d 3 d=false -Dosgi. classloader. lock=classname -XX: +Unlock. Diagnostic. VMOptions -XX: +Unsyncload. Class -Dknime. enable. fastload=true -Djava. library. path=C: UsersrosyDocumentsRwin-library2. 15r. Javajrix 64 39
Hands-on Session 5. Improve Workflows: Text Mining Data Reading Data Tagging Preprocessing Words Reading Tag Corpus Scoring and Tag Cloud Bo. W 40
Hands-on Session 6. Improve Workflows: Network Analytics Data Reading and preprocessing Create Network Object Visualize Network Clean up Network 41
zoomba 42
nahdude 812 43
- Slashdot
- Text analytics and text mining
- Text analytics and text mining
- Text mining social media
- Text and web mining
- Making connections
- Mining social network graphs
- Big data and social media analytics
- Watson analytics for social media
- Social media analytics tutorial
- Kyssande vind analys
- "amplitude" analytics or "product analytics"
- Strip mining vs open pit mining
- Chapter 13 mineral resources and mining
- Difference between strip mining and open pit mining
- Mining multimedia databases in data mining
- Mining complex types of data
- Who are the people media
- Network analytics big data
- Artificial neural network in data mining
- Neural network in data mining
- Power bi for qualitative data
- Text analytics summit
- Idol
- Text analytics world
- Text analytics forum 2019
- Jmp text analytics
- Text analytics ppt
- Text analytics unipi
- Social thinking social influence social relations
- Social thinking social influence social relations
- W62vr
- Svd text mining
- Text mining meaning
- Text mining
- Text mining
- Chuck huber stata
- Logiciel de text mining
- Text mining
- Text mining application programming
- Blackboard qu
- Where tradition meets tomorrow