DIFFERENT APPROACHES TO COMMUNITY EVOLUTION PREDICTION IN BLOGOSPHERE
DIFFERENT APPROACHES TO COMMUNITY EVOLUTION PREDICTION IN BLOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak
Different Approaches to Community Evolution Prediction in Blogosphere OUTLINE: Introduction and motivation Methods of events identification in group evolution: SCGI GED Predicting group evolution in the social network Dataset and experiment setup Classifiers – reminder For each method we will compare results between different classifiers conclusion 2
Different Approaches to Community Evolution Prediction in Blogosphere GENERAL IDEA Predicting the future direction of community evolution allows to determine which characteristics describing communities have importance from the point of view of their future behavior. 3
Different Approaches to Community Evolution Prediction in Blogosphere MOTIVATION Making decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it Allows to determine effective ways of forming opinions. Allows to protect group participants against such activities. 4
Different Approaches to Community Evolution Prediction in Blogosphere INTRODUCTION – PREDICTION Link prediction (Best investigated) link prediction problem: predicting the existence of a link (relation) between two nodes (users) within a social network. Liben-Nowell - focused on path and common neighbours between pair of nodes Lichtenwalter consider degrees and mutual information between them. 5
Different Approaches to Community Evolution Prediction in Blogosphere INTRODUCTION – PREDICTION Link sign prediction - Sign in this context means that predicted relation between users may be positive or negative Symeonidis looked at paths between the node pair and use the notion of similarity to predict the sign Leskovec use degree and mutual information between pair of nodes for link prediction and profits from theory of balance and status to predict the link sign. Richter and Wai-Ho faced the very important task of churn prediction (the number of individuals moving out of a collective over a specific period of time). Richter presented a new approach and tried to predict churn based on analysis of group behavior. This approach touches another aspect, not well studied yet, where evolution of the whole group is being predicted, i. e. which event will be next in group lifetime. 6
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTION OF THE GROUP EVOLUTION. What is a group? Set of vertices which communicate to each other more frequently than with vertices outside of a group A new method for future event prediction has been developed - based on stable group changes identification algorithm (SGCI) has been developed Prediction in this method is being made based on previous events in group lifetime extracted by SGCI group profile described by group size, cohesion, leadership and density 7
Different Approaches to Community Evolution Prediction in Blogosphere METHODS OF EVENTS IDENTIFICATION IN GROUP EVOLUTION 8
Different Approaches to Community Evolution Prediction in Blogosphere SGCI ALGORITHM Stable group changes identification Step 1. Identification of fugitive groups in the separate time frames. Whole network is divided into time frames In each time frame the method of finding communities in network is applied. Step 2. Identification of group continuation – assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) 9
Different Approaches to Community Evolution Prediction in Blogosphere SGCI ALGORITHM For each pair of non-empty groups A, B from Algorithm for stable group changes identification neighboring time slots we will calculate: Step 1. Identification of fugitive groups in the separate • MJ (- Modified Jaccard Measure) time frames. Whole network is divided into time frames • ds (- difference in size) In each time frame the method of finding communities in network is applied. Step 2. Identification of group continuation – assigning If MJ(A, B) is above a defined threshold and ds(A, B) transitions between groups in neighboring time steps. between these groups is no more than specified, then After extracting communities in time frames: the algorithm make transition between these groups. The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) 10
Different Approaches to Community Evolution Prediction in Blogosphere SGCI ALGORITHM Step 3. Separation of the stable groups (lasting for at least required subsequent time steps). In this step, the stable groups are retrieved. Step 4. Identification of types of group changes. Assigning events describing the change of the state of the group to the transitions. Each transition between stable groups from neighboring time frames. We can define some types of group changes (A and B are the groups from the first and the second time transitions accordingly). sh and dh are some thresholds. 11
Different Approaches to Community Evolution Prediction in Blogosphere SGCI ALGORITHM addition - when a small group attaches to a large one: deletion - when a small group detaches from a large one: merge - many groups in one time frame form a new larger group in the next time frame. split – group divides into some smaller groups in next time frame. split_merge - occurs when a group divides into at least 2 groups in the next time frame and one of this groups from next time frame is a result of merging with another from a previous time frame. constancy - simple transition without significant change of the group size: change size – simple transition with the change of the group size: dh decay - group does not exist in next time frame. dh 12
Different Approaches to Community Evolution Prediction in Blogosphere SGCI ALGORITHM For a given group it is possible to match more than one event from this group to groups in the next time frame. Some events can coexist with other ones but some of them cannot. Constancy event, can’t coexist with change size, merge or split event, Constancy event, can coexist with addition or deletion events. The addition and the deletion events can coexist with each event type, except the decay event. The decay event is always a single event for the group. 13
Different Approaches to Community Evolution Prediction in Blogosphere GED: GROUP EVOLUTION DISCOVERY For GED method we will calculate inclusion measure. It allows to evaluate the inclusion of one group in another. The inclusion of group G 1 in group G 2 is: group quantity* group quality NIG 1(x) – the importance of the node x in group G 1. The GED method takes into account both the quantity and quality of the group members. * Quantity can be expressed by any user importance measure e. g. centrality degree, betweenness degree, page rank, social position etc. 14
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION IN THE SOCIAL NETWORK 15
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS This approach for prediction future events of groups employs classifier. Structure: sequences of 3 states of groups (present time and two previous times) 16
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS Measures for the state of each group: leadership - measure describing centralization in graph or group (the largest value is for star network) d - max means maximum value of degree in group n - number of nodes in group. density - measure expressing how many connections between nodes are present in network in relation to all possible connections between them [16] where a(i, j) =1 when there is connection from node i to node j 17
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS Measures for the state of each group – cont. : cohesion - measure characterizing strength of connections inside group in relation to connections outside group (from group members) where w is function assigning weight between nodes, G is group, n - number of nodes in group and N - number of nodes in network group size - number of nodes in group 18
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS Described sequence of group states is an input for classifier. The predicted variable is the dominating next event for the last group in a sequence. 19
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS Dominating event - one of events assigned for a given group. The event with the highest priority among the assigned events is chosen. We use the following order of events (from the highest priority to the lowest one): constancy, change size, split, merge, addition, deletion, split_merge, decay. 20
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING SGCI RESULTS The group Gn, 1 has two assigned events: change size and addition, so the dominating event for group Gn, 1 is change size because this event has higher priority. 21
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING GED RESULTS The idea is using a simple sequence as an input for the classifier: preceding groups profiles and events. The learnt model will be able to produce very good results even for simple classifiers The sequences of groups sizes and events between time frames can be extracted from the GED results. For each event - four group profiles in four previous time frames together with three associated events are identified as the input for the classification model, separately for each group. A single group in a given time frame (Tn) is a case (instance) for classification, for which its event Tn. Tn+1 is being predicted. 22
Different Approaches to Community Evolution Prediction in Blogosphere PREDICTING GROUP EVOLUTION USING GED RESULTS The sequence presented in Figure 2 is used as an input for classification. The first part of the sequence is used as input features (variables): the group profiles per timeframe and the event types between them. The goal of classification is to predict (classify) Event Tn. Tn+1 type – out of the six possible classes: growing, continuing, shrinking, dissolving, and splitting. Forming was excluded since it can only start the sequence. 23
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Dataset description: Data from www. salon 24. pl which contains many blogs (mainly political) For tests we will use half of the data set: 04/04/2010 – 31/03/2012 26, 722 users Each time frame lasts 7 days 285, 532 posts Time frames overlap each other by 4 days 4, 173, 457 comments Yields a total of 182 time frames 24
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Group extraction: After separation of time frames the groups were extracted in each of the time frames. Done using CPM method (CPMd version) from CFinder tool (http: //www. cfinder. org/) for k=5. CFinder is a tool for finding and visualizing overlapping dense groups of nodes in networks, based on the Clique Percolation Method (CPM) 25
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Group sizes As we can notice in Figure 3 there are many small groups and groups with size 5 outnumber other ones. 26
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Experiment setup: SGCI method experiments were conducted using following parameters: MJ=0. 5, ds=50, sh=10 and dh=0. 05. Reminder: GED method was run on the dataset with all combination of GED parameters from the set: Quantity: {50%, 60%, 70%, 80%, 90%, 100%}. Quality (node importance): social position measure was utilized (measure similar to page rank). Reminder: group quantity* group quality 27
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Experiment setup: To describe the group profile, its size, density, cohesion and leadership were used Seven different classifiers were utilized with default settings All classifiers were utilized for both approaches: SGCI and GED 28
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Classifiers – reminder: What is a classifier? Adaptive system that learns to perform the best action given its input - identifying to which of a set of categories (subpopulations) a new observation belongs. What Is Multiclass Classification? Each training point belongs to one of N di� erent classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs 29
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Multi-Classification: direct approaches: Nearest Neighbor Generative approach & Naïve Bayes Linear classification: Multi-label classification: • Is it eatable? • Is it a banana? • Is it sweet? • Is it an apple? • Is it yellow? • Is it a fruit? • Is it an orange? • Is it sweet? • Is it a banana? • Is it a pineapple? • Is it round? 30 Nested/ Hierarchical Exclusive/ Multi-class General/Structured
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Multi-Classification – real world examples: Digit recognition Object recognition Automated protein classification 10 Phoneme recognition 300 -600 100 31 50
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP A Simple Idea — One-vs-All Classification Pick a good technique for building binary classifiers. Build N different binary classifiers. For the i’th classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i. Let fi be the i’th classifier. Classify with single classifier is trained per class to distinguish that class from all other classes 32
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 33
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A 34
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 7 DENSITY 35
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 7 DENSITY <0. 2 36
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 7 DENSITY <0. 2 B 37
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 2 B <0. 7 DENSITY >0. 2 COHISION 38
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 2 B <0. 7 DENSITY >0. 2 COHISION >0. 8 39
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 A <0. 2 B <0. 7 DENSITY >0. 2 COHISION >0. 8 B 40
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 <0. 7 A DENSITY >0. 2 <0. 2 B COHISION <0. 8 GROUP SIZE >0. 8 B 41
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 <0. 7 A DENSITY >0. 2 <0. 2 B COHISION <0. 8 GROUP SIZE <10 >0. 8 B 42
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 <0. 7 A DENSITY >0. 2 <0. 2 B COHISION <0. 8 GROUP SIZE <10 C >0. 8 B 43
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 <0. 7 A DENSITY >0. 2 <0. 2 B COHISION <0. 8 >0. 8 GROUP SIZE <10 C >10 B 44
Different Approaches to Community Evolution Prediction in Blogosphere DATASET AND EXPERIMENT SETUP Leadership Density Cohesion Group size Group 0. 85 0. 43 0. 28 5 A 0, 71 0. 48 0. 30 6 A 0, 65 0. 89 0. 99 13 B 0. 97 0. 53 0. 62 18 A 0. 47 0. 56 0. 75 4 C 0. 21 0. 12 0. 93 5 B 0. 35 0. 39 0. 92 6 B 0. 37 0. 42 0. 48 9 C 0. 88 0. 18 0. 21 9 A 0. 46 0. 72 0. 84 6 B 0. 12 0. 85 0. 87 14 B 0. 28 0. 27 0. 35 12 A LEADERSHIP >0. 7 <0. 7 A DENSITY >0. 2 <0. 2 B COHISION <0. 8 >0. 8 GROUP SIZE <10 C >10 A B 45
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results The measure selected is F-measure (AKA F 1 -measure) – represents accuracy of result program's precision = program's recall = The F measure is: 46
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results of prediction events for different classifiers: Tree classifiers (J 48, Random Forest and Simple CART) and Decision Table (rule classifier) achieved the best results. Notably worse results are for Naive Bayes and IBk. 47
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Results of classification for 3 tree classifiers. One can see that results for these 3 classifiers are very similar - the biggest difference is for the decay event which seemed harder to classify. Other events are well classified. 48 Results of event classification for decision tree classifiers
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Results of prediction obtained by probabilistic classifiers. Bayes. Net achieved quite good results, but Naive. Bayes much worse. Explenatuon: this classifier is based on assumption of independence features used to classification task. This requirement is not met because some values of one measure are correlated with values of another measure e. g. generally density has higher values for smaller groups. 49 Results of event classification for probabilistic classifiers
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Here we can see results for other tested classifiers. Decay event is significantly worse classified than other events (as seen before). The Ibk classifier accomplished worse results of prediction than Decision. Table one. For Ibk classifier the hardest event to classify seemed to be constancy. 50 Results of event classification for other classifiers
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Most popular event is the addition event (there is significantly more events of this type than other types of events). This is why this event is very well classified for each tested classifier. The percentage of events in dataset. 51
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results F-measure comparison for all event types (classes) and all classifiers. 3 tree classifiers achieved the best results (the worst F-measure value is 0. 57 for continuing) From the rest the Decision Table also achieved quite good results. F-measure for each event type (class) and each classifier. 52
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for decision tree classifiers 53
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for probabilistic classifiers 54
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for probabilistic classifiers 55
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for other classifiers 56
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS Predicting Group Evolution Using GED Results – cont. each classifier achieves the best results for splitting, merging and dissolving events and the worst for continuing, shrinking and growing. Why? because of uneven distribution of different event types instances 57
Different Approaches to Community Evolution Prediction in Blogosphere EXPERIMENTS The number of splitting events is much higher than for the rest of events probably because the time frame size is too short for the most communities and they continuously splits and merge as service users migrates from one topic to another. For the merging and dissolving events, most classifiers are able to produce very good results, despite the fact that they constitute only a small fraction of all events. 58
Different Approaches to Community Evolution Prediction in Blogosphere DISCUSSION, CONCLUSIONS AND FUTURE WORK The new method for future event prediction based on SGCI algorithm presented with comparison to the method based on GED algorithm. A high level of prediction quality was obtained using both presented methods 59
Different Approaches to Community Evolution Prediction in Blogosphere DISCUSSION, CONCLUSIONS AND FUTURE WORK The best results: In the case of both methods, best results were obtained using different decision tree classifiers. The worst results: In the SGCI method - using Naive Bayesian classifier In GED. - Naive Bayesian and Bayes Network classifiers. 60
61 Different Approaches to Community Evolution Prediction in Blogosphere THE END
- Slides: 61