STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX
STARTING EXPLORING MOBILE PHONE DATA IN THE SANDBOX Pilar Rey del Castillo
Mobile phone data in the Sandbox • Special case: only since October 2014 • Limited information provided in the dataset • Still very interesting to analyse – Sensors of human and social behaviour (location. . . ) – Example of requirements of exploratory step comparing with other type of data in the Sandbox – Aim �describe initial steps in attempting to produce meaningful results for statistical purposes 2
Location or positioning data • Concept in mobile phones & statistics context • User assigned to a number of neighbouring antennas for load balancing reasons • Types – Active – Passive: Call Detail Records (CDRs). . . Passive location �occasional samples of the approximate locations of the phone's user 3
Mobile phones datasets (1) • D 4 D Challenge: Orange's “Data for development” in Ivory Coast • Anonymised Call Detail Records (CDRs) of outgoing phone calls & sms exchanges – Orange’s customers in Ivory Coast – Between December 1, 2011 and April 28, 2012 (150 days, 5 months) • Sandbox IT infrastructure: perfect 4
Mobile phones datasets (2) • Total antenna-to-antenna traffic on an hourly basis ( 5 million customers) • Individual trajectories for 50. 000 customers for two week time windows 5
Literature exploiting location • Supplementary information at the micro level (ground truth) – Lausanne Data Collection Campaign (Nokia 2009 -2011) – Reality Mining Project (MIT 2004 -2005) – Ad hoc experiments, conducting surveys… : Isaacman et al. (2011), De Oliveira et al. (2011) – … • Just CDRs: Assumptions on the users' behaviour… – – Orange Data Challenges (Ivory Coast, Senegal) Järv et al. (Estonia, 2012) Kung et al. (Portugal, IC, Saudi Arabia, Boston, Milan, 2014) … 6
Ivory Coast data • Positioning data our aim: human home -> work commuting figures • Way to proceed: obtain results under certain assumptions and compare • First assumptions – Orange's customers represent population (96% subscriptions per 100 inhabitants, 2013) – Behaviour of 50000 customers sample is representative of mobility behaviour (to be assessed later) 7
2 nd step: model to draw meaningful information • Problem of oscillations: antennas aggregation by section = county x urbanization 157 sections • Problem of giving a meaning to user's location: daily & weekly patterns of use as discriminative features – Isaacman et al. (2011): • home weekends + weekdays between 7 pm & 7 am • work weekdays between 1 pm & 5 pm – Kung et al. (2014): • home weekdays between 8 pm & 8 am • work weekdays between 8 am & 8 pm Apart from other sophisticated filtering… 8
9
Commuting in Ivory Coast • Sample of 50000 customers 51% cluster 1 28% cluster 2 21% cluster 3 • Almost 50% of the sample home -> work located Estimate cross-tabulation commuting between Ivory Coast sections 10
Main commutes (%) home-> work between sections 11
Final remarks • CDRs useful tool to learn and test new methods (although no reliable figures produced) • Just a portion of possible ways to exploit CDRs promising source (need more research) • Another possible research strand: develop an "Official. Statistics" app for smartphones gathering ground truth 12
13
14
References • de Oliveira, R. , Karatzoglou, A. , Cerezo, P. C. , de Vicuña, A. A. L. and Oliver, N. (2011), “Towards a psychographic user model from mobile phone usage”, in Desney S. Tan; Saleema. Amershi; Bo Begole; Wendy A. Kellogg &Manas. Tungare, ed. , 'CHI Extended Abstracts' , ACM • Isaacman, S. , Becker, R. , Cáceres, R. , Kobourov, S. , Martonosi, M. , Rowland, J. and Varshavsky, A. (2011), “Identifying Important Places in People’s Lives from Cellular Network Data”, Lecture Notes in Computer Science Vol. 6696, pp. 133 -151. • Järv, O. , Ahas, R. , Saluveer, E. , Derudder, B. , and Witlox, F. ( 2012) “Mobile Phones in a Traffic Flow: A Geographical Perspective to Evening Rush Hour Traffic Analysis Using Call Detail Records”, PLo. S ONE 7(11), http: //dx. plos. org/10. 1371/journal. pone. 0049171 • Kung, K. S. , Greco, K. , Sobolevsky, S. , and Ratti, C. (2014), “Exploring Universal Patterns in Human Home-Work Commuting from Mobile Phone Data”, PLo. S ONE 9(6): e 96180. doi: 10. 1371/journal. pone. 0096180 15
16
- Slides: 16