Tracedriven Contextaware Mobile Networks Towards Mobile Social Networks
Trace-driven Context-aware Mobile Networks Towards Mobile Social Networks Ahmed Helmy Computer and Information Science and Engineering (CISE) Dept Mobile Networking Lab (NOMADS group) University of Florida helmy@ufl. edu http: //nile. cise. ufl. edu/Mobi. Lib
Birds-Eye View: Mobile & Networking Lab Architecture & Protocol Design Protocol Independent Multicast (PIM) Robust Geographic Wireless Services (Geo-Routing, Geocast, Rendezvous) Query Resolution in Wireless Networks (ACQUIRE & Contacts) Gradient Routing (RUGGED) Multicast-based Mobility (M&M) Worms, Traceback in Mobile Networks Mobility-Assisted Protocols (MAID) Socially-Aware Networks Methodology & Tools Network Simulator (NS-2) Test Synthesis (STRESS) Protocol Block Analysis (BRICS) Mobility Modeling (IMPORTANT, TVC) Behavioral Analysis in Wireless Networks (IMPACT & Mobi. Lib)
Introduction & Problem Scope • Future network devices are mobile & ‘personal’ – Very tight coupling between devices & humans • Network performance significantly affected by (and affects) users behavior: – Movement, grouping, on-line activity, trust, cooperation… • How do users behave in mobile societies? • What kinds of protocols/networks survive & perform well in highly mobile societies?
Paradigm Shift in Protocol Design Used to: Design general purpose protocols Evaluate using models (random mobility, traffic, …) Modify to improve performance and failures for specific context – May end up with suboptimal performance or failures due to lack of context in the design Propose to: Analyze, model deployment context Design ‘application class’-specific parameterized protocols Utilize insights from context analysis to fine-tune protocol parameters
Problem Statement • How to gain insight into deployment context? • How to utilize insight to design future services? Approach • Extensive trace-based analysis to identify dominant trends & characteristics • Analyze user behavioral patterns – Individual user behavior and mobility – Collective user behavior: grouping, encounters • Integrate findings in modeling and protocol design – I. User mobility modeling – II. Behavioral grouping – III. Information dissemination in mobile societies, profile-cast
The TRACE framework Mobi. Lib Trace Characterize (Cluster) Represent Analyze Employ (Modeling & Protocol Design)
Vision: Community-wide Wireless/Mobility Library • Library of – Measurements from Universities, vehicular networks – Realistic models of behavior (mobility, traffic, friendship, encounters) – Benchmarks for simulation and evaluation – Tools for trace data mining • Use insights to design future context-aware protocols? • http: //nile. cise. ufl. edu/Mobi. Lib Trace
Libraries of Wireless Traces • Multi-campus (community-wide) traces: – Mobi. Lib (USC (04 -06), now @ UFL) • nile. cise. ufl. edu/Mobi. Lib • 15+ Traces from: USC, Dartmouth, MIT, UCSD, UCSB, UNC, UMass, GATech, Cambridge, UFL, … • Tools for mobility modeling (IMPORTANT, TVC), data mining – CRAWDAD (Dartmouth) • Types of traces: – – – University Campus (mainly WLANs) Conference AP and encounter traces Municipal (off-campus) wireless Bus & vehicular wireless networks Others … (on going) Trace
Wireless Networks and Mobility Measurements • In our case studies we use WLAN traces – From University campuses & corporate networks (4 universities, 1 corporate network) – The largest data sets about wireless network users available to date (# users / lengths) – No bias: not “special-purpose”, data from all users in the network • We also analyze – Vehicular movement trace (Cab-spotting) – Human encounter trace (at Infocom Conf) Trace
Case study I – Individual mobility
Case Study I: Goal • To understand the mobility/usage pattern of individual wireless network users • To observe how environments/user type/tracecollection techniques impact the observations • To propose a realistic mobility model based on empirical observations – That is mathematically tractable – That is capable of characterizing multiple classes of mobility scenarios
IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis* - 4 major campuses – 30 day traces studied from 2+ years of traces - Total users > 12, 000 users - Total Access Points > 1, 300 Trace source Trace duration User type Environment Collection method Analyzed part MIT 7/20/02 – 8/17/02 Generic 3 corporate buildings Polling Whole trace Dartmouth 4/01/01 – 6/30/04 Generic w/ subgroup University campus Event-based July ’ 03 April ’ 04 UCSD 9/22/02 – 12/8/02 PDA only University campus Polling USC 4/20/05 – 3/31/06 Generic University campus Event-based 04/20/0505/19/05 (Bldg) 09/22/0210/21/02 • Understand changes of user association behavior w. r. t. – Time - Environment - Device type - Trace collection method * W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis”, two papers at IEEE Wireless Networks Measurements (Wi. NMee), April 2006
Metrics for Individual Mobility Analysis • What kind of spatial preference do users exhibit? – The percentile of time spent at the most frequently visited locations • What kind of temporal repetition do users exhibit? – The probability of re-appearance • How often are the nodes present? – Percentage of “online” time Represent
Fraction of online time associated with the AP Prob. (coverage > x) Observations: Visited Access Points (APs) CCDF of coverage of users [percentage of visited APs] Average fraction of time a MN associates with APs • Individual users access only a very small portion of APs in the network. • On average a user spends more than 95% of time at its top 5 most visited APs. • Long-term mobility is highly skewed in terms of time associated with each AP. • Users exhibit “on”/”off” behavior that needs to be modeled.
Repetitive Behavior • Clear repetitive patterns of association in wireless network users. • Typically, user association patterns show the strongest repetitive pattern at time gap of one day/one week.
Mobility Characteristics from WLANs Prob. (online time fraction > x) • Simple existing models are very different from the characteristics in WLAN On/off activity pattern Skewed location preference Periodic re-appearance Characterize
Mobility Models • Mobility models are of crucial importance for the evaluation of wireless mobile networks [IMP 03] • Requirements for mobility models – Realism (detailed behavior from traces) – Parameterized, tunable behavior – Mathematical tractability • Related work on mobility modeling – Random models (Random walk/waypoint): inadequate for human mobility – Improved synthetic models (pathway model, RPGM, WWP, FWY, MH) – more realistic, difficult to analyze – Trace-based model (T/T++): trace-specific, not general
Time-variant Community (TVC) Model (W. Hsu, Thyro, K. Psounis, A. Helmy, “Modeling Time-variant User Mobility in Wireless Mobile Networks”, IEEE INFOCOM, 2007, Trans. on Networking) • Skewed location visiting preference – Create “communities” to be the preferred area of movement – Each node can have its own community • Node moves with two different epoch types – Local or roaming – Each epoch is a random-direction, straight-line movement – Local epochs in the community – Roaming epochs around the whole simulation area 75% 25% Employ
Tiered Time-variant Community (TVC) Model • Periodical re-appearance – Create structure in time – Periods – Node moves with different parameters in periods to capture time-dependent mobility – Repetitive structure • Finer granularity in space & time – Multi-tier communities – Multiple time periods Employ
Using the TVC Model – Reproducing Mobility Characteristics • (STEP 1) Identify the popular locations; assign communities • (STEP 2) Assign parameters to the communities according to stats • (STEP 3) Add user on-off patterns (e. g. , in WLAN, users are usually off when moving)
Using the TVC Model – Reproducing Mobility Characteristics • WLAN trace (example: MIT trace) Skewed location visiting preference Periodic re-appearance * Model-simplified: single community per node. Model-complex: multiple communities ** Similar matches achieved for USC and Dartmouth traces
Using the TVC Model – Reproducing Mobility Characteristics • Vehicular trace (Cab-spotting)
Using the TVC Model – Reproducing Mobility Characteristics • Human encounter trace at a conference Inter-meeting time A encounters B Encounter Inter-meeting duration time Encounter duration time
Case study II – Groups in WLAN
Case Study II: Goal • Identify similar users (in terms of long run mobility preferences) from the diverse WLAN user population – Understand the constituents of the population – Identify potential groups for group-aware service • In this case study we classify users based on their mobility trends (or location-visiting preferences) – We consider semester-long USC trace (spring 2006, 94 days) and quarter-long Dartmouth trace (spring 2004, 61 days)
Representation of User Association Patterns • We choose to represent summary of user association in each day by a single vector – a = {aj : fraction of online time user i spends at APj on day d} -Office, 10 AM -12 PM -Library, 3 PM – 4 PM -Class, 6 PM – 8 PM Association vector: (library, office, class) =(0. 2, 0. 4) • Summarize the long-run mobility in an “association matrix” Represent
Eigen-behavior • Eigen-behaviors: The vectors that describe the maximum remaining power in the association matrix (obtained through Singular Value Decompostion) with quantifiable importance • Eigen-behavior Distance calculates similarity of users by weighted inner products of eigen-behaviors. – • Assoc. patterns can be re-constructed with low rank & error • Benefits: Reduced computation and noise
Similarity-based User Classification • With the distance between users U and V defined as 1 -Sim(U, V), we use hierarchical clustering to find similar user groups. USC *AMVD = Average Minimum Vector Distance Dartmouth
Validation of User Groups • Significance of the groups – users in the same group are indeed much more similar to each other than randomly formed groups (0. 93 v. s. 0. 46 for USC, 0. 91 v. s. 0. 42 for Dartmouth) • Uniqueness of the groups – the most important group eigen-behavior is important for its own group but not other groups Significance score of top eigen-behavior for USC Dartmouth Its own group 0. 779 0. 727 Other groups 0. 005 0. 004
User Groups in WLAN - Observations • Identified hundreds of distinct groups of similar users • Skewed group size distribution – the largest 10 groups account for more than 30% of population on campus. Power-law distributed group sizes. • Most groups can be described by a list of locations with a clear ordering of importance • We also observe groups visiting multiple locations with similar importance – taking the most important location for each user is not sufficient
Case study III – Encounter Patterns
Case Study III: Goal • Understand inter-node encounter patterns from a global perspective – How do we represent encounter patterns? – How do the encounter patterns influence network connectivity and communication protocols? • Encounter definition: – In WLAN: When two mobile nodes access the same AP at the same time they have an ‘encounter’ – In DTN: When two mobile nodes move within communication range they have an ‘encounter’
Prob. (total encounter events > x) Prob. (unique encounter fraction > x) Observations: Encounters CCDF of unique encounter count CCDF of total encounter count • In all the traces, the MNs encounter a small fraction of the user population. • A user encounters 1. 8%-6% on average of the user population (except UCSD) • The number of total encounters for the users follows a Bi. Pareto distribution.
Encounter-Relationship (ER) graph • Draw a link to connect a pair of nodes if they ever encounter with each other … Analyze the graph properties? Group of good friends… Cliques with random links to join them Represent
Small Worlds of Encounters Regular graph Normalized CC and PL • Encounter graph: nodes as vertices and edges link all vertices that encounter Clustering Coefficient (CC) Small World Av. Path Length Random graph • The encounter graph is a Small World graph (high CC, low PL) • Even for short time period (1 day) its metrics (CC, PL) almost saturate
Background: Delay Tolerant Networks (DTN) • DTNs are mobile networks with sparse, intermittent nodal connectivity • Encounter events provide the communication opportunities among nodes • Messages are stored and moved across the network with nodal mobility A B C
Information Diffusion in DTNs via Encounters • Epidemic routing (spatio-temporal broadcast) achieves almost complete delivery Trace duration = 15 days Unreachable ratio (Fig: USC) Robust to the removal of short encounters Robust to selfish nodes (up to ~40%)
Encounter-graphs using Friends • Distribution for friendship index FI is exponential for all the traces • Friendship between MNs is highly asymmetric • Among all node pairs: < 5% with FI > 0. 01, and <1% with FI > 0. 4 • Top-ranked friends form cliques and low-ranked friends are key to provide random links (short cuts) to reduce the degree of separation in encounter graph.
Profile-cast W. Hsu, D. Dutta, A. Helmy, Mobicom 2007 • Sending messages to others with similar behavior, without knowing their identity – Announcements to users with specific behavior V – Interest-based ads, similarity resource discovery • Assuming DTN-like environment B Is E similar to V? E Is B similar to V? C ? D Is C/D similar to V? A
Profile-cast Use Cases • Mobility-based profile-cast – Targeting group of users who move in a particular pattern (lost-and-found, context-aware messages, moviegoers) – Approach: use “similarity metric” between users • Mobility-independent profile-cast – Targeting people with a certain characteristics independent of mobility (classic music lovers) – Approach: use “Small World” encounter patterns
Mobility-based Profile-cast Mobility space N SN S D Forward? ? D Scoped message spread in the mobility space N N D
Profile-cast Operation 1. profiling N N S N N – Singular value decomposition • Profiling user mobility provides summary ofnode the – Theamobility of a matrixis(Arepresented few eigen-behavior by an vectors are sufficient, e. g. for association matrix 99% of users at most 7 vectors describe 90% of power in the association matrix) Each row represents an association vector for a time slot Sum. vectors An entry represents the percentage of online time during time slot i at location j
Profile-cast Operation 1. profiling N N S N 2. Forwarding decision N • Determining user similarity – S sends Eigen behaviors for the virtual profile to N – N evaluated the similarity by weighted inner products of Eigen-behaviors – Message forwarded if Sim(U, V) is high (the goal is to deliver messages to nodes with similar profile) – Privacy conserving: N and S do not send information about their own behavior
Profile-cast Evaluation • Epidemic: Near perfect delivery ratio, low delay, high overhead • Centralized: Near perfect delivery ratio, low overhead, a bit extra delay • Decentral: provides tradeoff between delivery & overhead • Random: poor delivery ratio Epidemic Decentral Random - Decentralized I-cast achieves: > 50% reduction in overhead of Epidemic >30% increase in delivery of Random * Results presented as the ratio to epidemic routing
Evaluation - Result Success Rate Delay Overhead • Centralized: Excellent success 92% rate with only 3% overhead. • Similarity-based: 45% (1) 61% success rate at low overhead, 92% success rate more overhead at 45% overhead (2) A flexible success rate – overhead tradeoff • RTx with infinite TTL: Much more overhead undersimilar success rate • Short RTx with many copies: Good success rate/overhead, but delay is still long
Profile-cast Initial Results • Adjustable overhead/delivery rate tradeoff – 61% delivery rate of flooding with 3% overhead – 92% delivery rate with 45% overhead • Better than single random walk in terms of delay, delivery rate • Multiple short random walks also work well in this case
Future Work • Sending to a mobility profile specified by the sender – Gradient ascend followed by similarity comparison (in the mobility space) • Mobility independent profile-cast – The encounter pattern provides a network in which most nodes are reachable – We don’t want to flood – How to leverage the Small World encounter pattern to reach the “neighborhood” of most nodes efficiently?
Future Work – One-copy-per-clique in the “mobility space” – We expect this to work because similarity in mobility leads to frequent encounters
Future Directions (Applications) • Detect abnormal user behavior & access patterns based on previous profiles • Behavior aware push/caching services (targeted ads, events of interest, announcements) • Caching based on behavioral prediction • Can/should we extend this paradigm to include social aspects (trust, friendship, …)? • Privacy issues and mobile k-anonymity
On Mobility & Predictability of Vo. IP & WLAN Users J. Kim, Y. Du, M. Chen, A. Helmy, Crawdad 2007 Work in-progress Markov O(2) Predictor Accuracy Vo. IP User Prediction Accuracy -Vo. IP users are highly mobile and exhibit dramatic difference in behavior than WLAN users -Prediction accuracy drops from ave ~62% for WLAN users to below 25% for Vo. IP users Motivates -Revisiting mobility modeling -Revisiting mobility prediction
Gender-based feature analysis in Campus-wide WLANs U. Kumar, N. Yadav, A. Helmy, Mobicom 2007, Crawdad 2007 M Fe s ale F m ty rni e rat visitors So r or ale s ity University Campus traces - Able to classify users by gender using knowledge of campus map -Users exhibit distinct on-line behavior, preference of device and mobility based on gender -On-going Work -How much more can we know? -What is the “information-privacy trade-off”?
The Next Generation (Boundless) Classroom Students sensor sensor-adhoc Embedded sensor network WLAN/adhoc sensor Multi-party conference Tele-collaboration tools sensor-adhoc Instructor WLAN/adhoc Challenges sensor sensor-adhoc -Integration of wired Internet, WLANs, Adhoc Mobile and Sensor Networks -Will this paradigm provide better learning experience for the students? Real world group experiments (structural health monitoring)
Future Directions: Technology. Human Interaction The Next Generation Classroom Emerging Wireless & Multimedia Technologies Protocols, Applications, Services Human Behavior Mobility, Load Dynamics
Engineering Multi-Disciplinary Research Human Computer Interaction (HCI) & User Interface Social Sciences Cognitive Sciences Application Development Service Provisioning Emerging Wireless & Multimedia Technologies Mobility Models Traffic Models Psycology How to Capture? Protocols, Applications, Services Protocol Design Measurements Education Context-aware Networking How to Design? Human Behavior Educational/ Learning Experience How to Evaluate? Mobility, Load Dynamics
Disaster Relief (Self-Configuring) Networks sensor sensor sensor sensor sensor
On-going and Future Directions Utilizing mobility – Controlled mobility scenarios • Dak. Net, Message Ferries, Info Station – Mobility-Assisted protocols • Mobility-assisted information diffusion: EASE, FRESH, DTN, $100 laptop – Context-aware Networking • Mobility-aware protocols: self-configuring, mobility-adaptive protocols • Socially-aware protocols: security, trust, friendship, associations, small worlds – On-going Projects • Next Generation (Boundless) Classroom • Disaster Relief Self-configuring Survivable Networks
Thank you! Ahmed Helmy helmy@ufl. edu URL: www. cise. ufl. edu/~helmy Mobi. Lib: nile. cise. ufl. edu/Mobi. Lib
Emerging Wireless Communication • Opportunities • Challenges – Dynamic network structure – Decentralized service paradigm – Tight coupling between the devices and individuals
Outline Complete case Detailed behavior analysis Future Work
Trace Sets • Available information from WLAN traces – MAC addresses of the devices as identifiers – Location/Time of users (our main focus) Node: e 0_12_29_fc_ba_8 c Association Start time 2197745 2230200 2257917 2285119 2297134 2304287 Location_ID 172. 16. 8. 244_11009 172. 16. 8. 244_11023 Duration 4433 13320 643 1017 7153 6744 Trace
Summary (Case Study I) • We observe some omni-present mobility characteristics from WLANs. • These characteristics are not captured by existing synthetic mobility models (i. e. , hence the models are not realistic) • We propose the Time-variant Community (TVC) model, which is realistic, theoretically tractable, and flexible
Theoretical Tractability • For the TVC model, we can derive – Nodal spatial distribution – the demographic profile of the mobility model – Average node degree – important for cluster maintenance and geographic routing – Hitting time/ Meeting time – important for routing performance analysis • With low error when the communication range is small compared to the community sizes (communication disk < 25% of community)
Theoretical Tractability
Theory Derivation – Hitting Time • Hitting time – the time for a node to move into the communication range of a randomly chosen target coordinate, starting from the stationary distribution (hit)
Theory Derivation – Hitting Time 1. Weighted average conditioned on the relative location of the ‘target’ 2. Calculate the unit-time hitting probability for each scenario 3. Calculate hitting probability for the whole time period 4. Calculate the conditional hitting time
Application II: Trace-based Mobility Modeling • Skewed location preference • Repetitive behavior – Nodes spend 95% of time at top 5 preferred locations. – Heavily visited “preferred spots” – Nodes show up repeatedly at the same location after integer multiples of days. – Periodical “daily/weekly schedules”
Similarity-based User Classification: Association-based Representation* (AP 2: library, 1: 30 PM-2: 30 PM) (AP 1: office, 10 AM-12 PM) (AP 3: class, 6 PM-8 PM) Association vector: (AP 1, AP 2, AP 3) =(0. 2, 0. 4) • For a given day d, user assoc. vector is defined by n-element vector – a = {aj : fraction of online time user i spends at APj on day d} – ‘n’ is the number of APs – Use zero vector for off-line users • Vector elements quantify relative attraction of AP to user • User Association Consistency – User i is ‘consistent’ if daily assoc. vectors can be grouped into few clusters – Use clustering with Manhattan distance measure * W. Hsu, D. Dutta, A. Helmy, Mobicom 2007
Summarizing user associations • Association matrix: concatenate user association vectors for all days into a matrix. • To summarize, perform SVD and store the top-k eigen values/vectors. • What value of k we have to use for a good representation of the matrix? – Captured matrix power = • How much is the reconstruction error? – Matrix norms ||X-Xk||p/||X||p where Daily association vector
Summarizing user associations Trace % users # vectors (rank) power USC (Bldg) 95% 6 90% Dartmouth (AP) 92% 6 90% Dartmouth (Bldg) 94% 4 90% * Matrix reconstruction error < 5% Assoc. patterns can be re-constructed with low rank and low error
Clustering Users with Similar Behavior • Exhaustive comparison of assoc. vectors: – Find average of |ajd - aid| over all days d for all i, j pairs – Drawback: O(nd 2) for each pair • Compare similarity of eigen-vectors obtained from SVD • Use weighted inner products of eigen vectors U, V – , – – wui = proportion of power of SV – D(U, V) = 1 - Sim(U, V) – Corr > 91% with exhaustive Can achieve very good clustering efficiently using distributed computation A handful of eigen-vectors can capture most of the behavior power
• Derived from simultaneous associations to the same locations Prob. (unique encounter fraction > x) Encounter Events • How many other nodes does a node encounter with? 0. 5 On avg. only 2%~7% of population
Encounter-Relationship graph Disconnected Ratio (%) • To our surprise, disconnected pairs of nodes are low!!
Summary (Case Study II) • We use SVD to obtain eigen-behaviors of individual users. • We use the eigen-behavior distances and hierarchical clustering to classify WLAN users into similar groups. • This finding is useful for mobility modeling (identifying group sizes and their frequently visited locations), network management, abnormality detection, and group-aware protocol (i. e. , profile-cast, our future work)
Summary (Case Study III) • The distribution of encounters in real WLAN trace is very different from synthetic models • The encounter-relationship graph displays Small. World characteristics • Despite a low encounter ratio of the whole population, the encounter events lead to a robust, reachable network (with long delay).
Future Work – Profile-cast
Goal • To send messages to a group of nodes within the general population – The group is defined by the intrinsic behavior patterns of the nodes (CISE students, library visitors, moviegoers) – The sender does not know the network identities (addresses) of the destinations • Different from multi-cast: No join/leave, no group maintenance
Largest number of female users is in social sciences and is much higher than the male WLAN users in those buildings. Female users are surprisingly high (vs males) in the first 2 samples. WLAN activity was down Feb 07 due to lower enrollment in Spring and potential changes in the network.
Females in social, economic, admin and comm/journalism generally have longer session durations than males in those majors. In Engineering, music and chemistry the opposite is true. Session durations are decreasing indicating potential increase in mobility.
Apple consistently more popular in females than males Intel (PCs) are more popular in males than females Increase in use of Apple and Intel in general, and degradation in other brands
Mobility Profile-cast (intra-group) Goal Flooding S Flood-sim S Single long random walk S S Multiple short random walks S
Mobility Profile-cast (inter-group) Goal Flooding S T. P. Gradient-ascend S T. P. Single long random walk S T. P. Flooding_sim Multiple short random walks S T. P.
Performance Comparison Gradient ascend helps to overcome the difficult case – when the source is far from T. P. Few long RW is better when S is far from T. P. but many short RW is better when S is close to T. P.
Performance Comparison Gradient ascend helps to overcome the difficult case – when the source is far from T. P. Gradient ascend has some extra delay comparing with flooding Few long RW is better when S is close to. T. P. but many short RW is better when S is close to T. P.
Mobility Independent Profile-cast Goal Flooding S Small. World-based S Single long random walk S S Multiple short random walks S
- Slides: 84