Data Mining Integration and Analysis Karin Becker Data

+ Data Mining, Integration and Analysis Karin Becker

+ Data Mining, Integration and Analysis Reserach Areas Faculty n Knowledge Discovery n Ana Lucia Cetertich Bazzan n Web and Text Mining n Joao Luiz Dihl Comba n Karin Becker n Leandro Krug Wives n Lucas Mello Schnorr n Mara Abel n Renata De Matos Galante n Viviane Pereira Moreira n n Data Science Recommendation Systems n Scalability and Performance n Reproducibility

+ Knowledge Discovery n What do we do?

+ Knowledge Discovery • • • Data Collection Data Integration Data Preprocessing Data Mining Data Analysis

+ Karin Becker

+ Extract Knowledge from Social Media n Semantic enrichment framework for event-related tweet identification (Simone Romero) n No assumptions about event properties n Contextual knowledge from semantic web and external documents n Improved mainly recall Simone Romero, Karin Becker. A framework for event classification in tweets based on hybrid semantic enrichment. Expert Systems with Applications 118: 522 -538 (2019)

+ Extract Knowledge from Social Media n Identification of stance in tweets (Marcelo Dias) n No threads of argumentations n Unsupervised and weakly supervised* frameworks (runner-up) n Target and stance expression depends on the domain Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approach for Classification of Stance in Tweets. Proc. of Web Inteligence, 2016.

+ Extract Knowledge from Social Media n Identification of stance in tweets n Unsupervised framework n Excelent perfomance on straightfoward targets (Hillary, Clinton) Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approach for Classification of Stance in Tweets. Proc. of Web Inteligence, 2016.

+ Extracting Knowledge from Social Midia n analyze the emotions people express about terrorism events in Twitter using demographics (Jonathas Harb) n Automatic emotion classification (4 terrorism events) n Tested deep learning with different seeding strategies n Demographic analysis (Face++, Profile Location) Jonathas Harb, Karin Becker. Emotion Analysis of Reaction to Terrorism on Twitter. Proc. of Workshop on Big Social Data and Urban Computing, 2018.

Analysis ● Q 2: Do different terrorism events raise the same emotional reaction? NO Gender? Age? Location? Our hypothesis: it depends on how people relate to the event

+ Extracting Knowledge from Social Midia n Compare engagement of twitter users in Pink October and Blue November campaigns (Roberto Walter) n 5 different countries n Demographic analysis (Face++, Profile Location) n Tweet topic categorization Roberto Walter, Karin Becker. Caracterização e Comparação das Campanhas do Outubro Rosa e Novembro Azul no Twitter. SBBD 2018: 133 -144

+ Extracting Knowledge from Social Midia n Topic discovery and drift analysis

+ Extracting Knowledge from Social Interaction n Relating conversational topics and toxic behavior effects in a MOBA game (Joaquim Mesquita) n MOBA Games (Lo. L) n Effects of toxic behavior on other players n Behavioral patterns based on on-line chats Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10 -29 (2018)

+ Extracting Knowledge from Social Interaction n Relating conversational topics and toxic behavior effects in a MOBA game (Joaquim Mesquita) n MOBA Games (Lo. L) n Effects of toxic behavior n Behavioral Patterns based on on-line chats Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10 -29 (2018)

+ Extracing Knowledge from Medical Data n Machine translation for biomedical texts, paralel corpus (Felipe Soares) n Hierarchical classifier for non-invasive colorectal cancer screening n Plasma fluorescence data n Cancer, No findings, Further investigation Felipe Soares, Karin Becker, Michel J. Anzanello: A hierarchical classifier based on human blood plasma fluorescence for non-invasive colorectal cancer screening. Artificial Intelligence in Medicine 82: 1 -10 (2017)

+ Extracting Knowledge from Medical Data n n Relating mental states using social media (Vanessa Borba) n Characterization of mental states (verbal cues, emotions and sentiments, behavioral and social patterns) n Analysis of temporal evolution of mental states (e. g. Ansiety – depression – suicide) Detecting Anomalies in Health Provision Records (Cristiano Sulzbach) n Lack of parameters of “normality” n Discovery of groups of data n Analysis of closeness

+ A final word on Software Engineering n Strong background on software engineering n Industry experience n Agile Methods n Sentiment analysis on software artifacts n Satisfaction of IT users (Sentiment analysis on IT Tickets, Blaz, 2016) n Analisis of assertiveness of user stories and development productivity and quality metrics (Guilherme Dias, 2018) n Using gamefication in SCRUM for self-imrpovement (Camilla Schmidt, on-going)

Data Integration Data Analysis Renata Galante galante@inf. ufrgs. br

Raul Barth (master) Passenger density and flow analysis and city zones and bus stops classification for public bus service management

Framework • DMBSM – Data Mining Framework for Bus Service Management • Input: GPS, bus stop and smart card data • Extracting as passengers’ density and flow information • Bus stops segmentation based on travel purposes • Finding the real bus service demand • Enabling decision-making. • Based on Lambda Architecture, using Big Data for parallel processing

Framework – Architecture and Results







Drunk Text Identification Marcos Grzeça, Karin Becker, Renata Galante (UFRGS)

Drunk Text Identification Detecção de textos escritos por pessoas alcoolizadas Marcos Grzeça, Karin Becker, Renata Galante (UFRGS) Romero & Becker (2019)

Drunk Text Identification

Drunk Text Identification
- Slides: 31