Discovery of Aggregate Usage Profiles for Web Personalization
Discovery of Aggregate Usage Profiles for Web Personalization Web. KDD 2000 Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire
System Architecture
Data Abstractions • Drafts from W 3 C Web Characterization Activity(WCA) TERM DEFINITION user A single individual that is accessing file from one or more Web servers through a browser Every file that contributes to the display on a user’s browser at one time. It is usually associated with a single user action. pageview clickstream A sequential series of page view requests user session The click-stream of pageviews for a single user across the entire web server session The set of pageviews in a user session for a particular web site episode Any semantically meaningful subset of a user or server session.
Typical Web Usage Mining Preprocessing
A Example B USER 1 : A B F O G A D USRE 2 : A B C J USRE 3 : L R F O T G P C H I Q D J K E L R M N S
Usage Mining • After preprocessing, we will have – A set of n pageview records, P = { p 1, p 2, … , pn } – A set of m user transactions, T = { t 1, t 2, … , tm } • Each transaction can be viewed as n-dimensional vector t = <w(p 1, t), w(p 2, t), … , w(pn, t)> • Goal of Usage Mining – Aggregate Usage profiles representing groups of different user behaviors. – Each item in a usage profile is a URL representing a relevant pageview object, and can have an associated weight representing its significance within the profile.
Transaction Clustering • Use k-means algorithm to partition this pageview space into different clusters. • PACT(Profile Aggregations on Clustering Transactions) Given a transaction cluster c, construct a usage profile prc = { <p, weight(p, prc)> | p P, weight(p, prc) } weight(p, prc) = 1 Σ w(p, t) |C| t c
Pageview Clustering (1/2) • Use Apriori algorithm to find frequent item sets. • Use (ARHP)Association Rule Hypergraph Partitioning to find aggregate profiles. Hypergraph H = (V, E) V : pageview set E : weighted frequent itemsets F D average confidence O J 0. 6 0. 4 E A P G H L R K M 0. 7 0. 6 B C I N Q
Pageview Clustering (2/2) Fitness(C) = F | {e| e C, v e}| Connectivity(v) = D J |{e|e C}| O 0. 6 L 0. 4 E A Σe C Weight(e) Σ|e∩ C| Weight(e) P G R K M 0. 7 H 0. 6 B Q N I C 2 1 2
Recommendation • Given a usage profile C, we can represent C as a vector C = { w 1 c, w 2 C, … , wn. C } Wic = weight(pi, C), if pi C 0, otherwise • Given current active session S, S=<s 1, s 2, …, sn> match(S, C) = Rec(S, p) = Σwkcsk Σ(sk)2 Σ(wkc)2 weight(p, C)match(S, C)
- Slides: 10