Social network analysis friend network in blogosphere Social
Social network analysis & friend network in blogosphere 吳邦一 樹德資 系
Social network n n Node: actor (people, group, organization) Arc (edge) : social relation tie, such as friend, collaboration, message transmission… n n Directed or undirected (bidirectional or unidirectional) Friend network: n n n Node: people Arc: friend relationship In blogosphere: a node is a blog
Social Network
Friend relation in blogosphere n By data mining n n n Similar hyper-linking Similar interests comments Cross posting From the explicit friend lists maintained by bloggers themselves
The hubs in Wretch
Balance theory n People tend to maintain balanced relationship: n Reciprocity: bidirectional tie n n n Transitivity: a friend’s friend tends to be friend Bloggers like to but hard to know n n symmetry, undirected Who add me as a friend Also hard to know a friend of distance more than 2
人緣列表 n Only in few blog systems (in Taiwan) n n n Other blog systems in Taiwan n n MSN live spaces, Pixnet : need confirmation Yam(天空部落)提供人緣列表 Wretch, PCHome, Xuite, Blogger, Yahoo, Sina, … Wretch just provides the service recently.
Why crawling the friend network n 學術研究 n Social network analysis: n n 傳統上只能做小型社群: data acquisition Online data: 有機會分析大型的朋友網路 n n n Newman (01): Scientific collaboration networks Ahn (07): Cy. World, 超過一千多萬人,韓國最大Blog 系統 提供bloggers查詢服務 n 人際關係搜尋引擎
WARM – blog friend relationship search service http: //warm. stu. edu. tw
系統規模 Blog 用戶 鏈結 Wretch 2, 948, 702 43, 939, 230 Yam 177, 929 1, 438, 857 Pixnet 49, 849 21, 867 Xuite 62, 257 159, 891
報導-TVBS
The performance
The difficulty of blog friend network analysis n Blog friend relation differs from the real one n Data incompleteness n n Hub-effect n n Only for unidirectional relationships How to verify n n n suffered for all social network analyses Traditional method Network reconstruction good metrics need to be defined
關係搜尋: all shortest paths
Average distance
How to compute n BFS n n n O(mn) is too time-consuming Random sampling (100 nodes is enough) Is diameter a good metric? n n Usually not strongly connected Effect diameter (90 percentile)
Degree 分佈 (log-log scale) Power-law with two slops Big-tail 三種會員等級
Clustering 係數 Degree為k者其好友之間有關聯之機率 (big-tail)
The friend group n n n Define friend group as a clique in the transitive extension Find the max-clique in the extension Density analysis
2 -clique n n d(u, v)<=2 for all u and v Even 2 -clique is too sparse n May have a small density 2/n
3/2 -clique n We define the 3/2 -clique n n n d(u, v)+d(v, u)<=3 Each pair is on a 3 -cycle or bidirectional friends The density is at least ½.
n The 3/2 -clique are much more dense than theoretical lower bound n n Well-structure network but not random at all A good method to find the friend group in blogosphere with unidirectional friend relationship
Degree of balance n Reciprocity n n The prob. of that an edge is bidirectional = the ratio of bidirectional edges 0. 51 for Wretch Transitivity degree n n The prob. of that a friend’s friend is also a direct friend. 0. 0337 for Wretch (almost not depending on degree)
Betweenness n The number of shortest paths passing through a node (an edge) n n n Large for inter-cluster nodes Small for intra-cluster nodes Used to find community n Girvan-Newman’s algorithm
Betweenness n Not good for large networks n n Friends of distance>2 have less influence Hard to compute n n GN algorithm takes O(m^2 n) time Maybe we should try to define the betweenness with limit distances
Remarks n n Social computing: 方興未艾 Social network analysis for blogosphere or WWW n n n 計算問題待解決 評估模式待定義 真相待發覺 機會與需求極大 商機無限
The End Thank you
- Slides: 67