m Plane Building an Intelligent Measurement Plane for
m. Plane – Building an Intelligent Measurement Plane for the Internet Alessandro Finamore – Politecnico di Torino <alessandro. finamore@polito. it> International Computer Science Institute - ICSI February 6 th, 2014
Outline 1. m. Plane introduction 2. Monitoring CDN 2
The Internet is nowadays a complicated technology… The internet is a key infrastructure where different technologies are combined to offer a plethora of services. It’s horribly complicated. We sorely miss the technology to understand what is happening in the network and to optimize its performance and utilization. 3
m. Plane goals n https: //www. ict-mplane. eu About the design and demonstration of a “measurement plane for the Internet” q Large scale n Vantage points on a worldwide scale n Integrate multiple measurement technologies q Intelligent n Automate/simplify the process of “cooking” raw data n Provide root-cause-analysis capabilities q Flexible n Offers APIs to enable integration n Not strictly bounded to specific “use cases” 4
m. Plane consortium n Coordinator WP 7 16 partners q q 3 operators 6 research centers 5 universities 2 small enterprises Marco Mellia POLITO WP 2 WP 1 Ernst Biersack Eurecom Brian Trammell ETH Dario Rossi ENST WP 3 Guy Leduc Univ. Liege Tivadar Szemethy Net. Visor WP 5 WP 6 Andrea Fregosi Fastweb Dina Papagiannaki Telefonica Saverio Nicolini NEC Fabrizio Invernizzi Telecom Italia WP 4 Pedro Casas FTW Pietro Michiardi Eurecom 5
m. Plane components active probe passive probe data control 6
m. Plane WPs’organization WP 8 - Project Management WP 7 - Dissemination, Exploitation and Standardization WP 4 - m. Plane Supervisor: Iterative and Adaptive Analysis (supervision layer) WP 1 Use Cases, Requirements and Architecture WP 3 - Large-scale Data Analysis (Repository and Analysis Layer) WP 2 – Programmable Probes (Measurement Layer) WP 5 Integration, Deployment, Data Collection, Evaluation WP 6 Demonstration 7
m. Plane layers Measurement Layer WP 2 m. Interface Raw data m. Interface m. Probe 1 m. Probe 2 m. Probe N m. Interface legacy. Probe 1 legacy. Probe 2 m. Interface e legacy. Probe N 8
m. Plane layers Repository and Analysis Layer Data collection & processing legacy. DB 1 legacy. DB 2 WP 3 legacy. DB N m. Plane Repository DBStream Blockmon Measurement Layer WP 2 m. Interface Raw data m. Interface m. Probe 1 m. Probe 2 m. Probe N m. Interface legacy. Probe 1 legacy. Probe 2 m. Interface e legacy. Probe N 9
m. Plane layers Coordination Intelligent Reasoner WP 4 Module N Module 2 Module 1 Analysis Modules Repository and Analysis Layer Data collection & processing Supervisor legacy. DB 1 legacy. DB 2 legacy. DB N m. Plane Repository WP 3 DBStream Blockmon Measurement Layer WP 2 m. Interface Raw data m. Interface m. Probe 1 m. Probe 2 m. Probe N m. Interface legacy. Probe 1 legacy. Probe 2 m. Interface e legacy. Probe N 10
Iterative analysis ! m r a Al Supervisor Repository Raw data Setup the system to monitor a service (e. g. , quality of You. Tube streaming) q passive probe reports an anomaly q start RCA 1. crosscheck on other passive probes 2. crosscheck with larger time scale 3. crosscheck with active probing 4. Is because of a. DNS b. Routing Found c. Others? 11
Some of m. Plane use cases FOCUS n n n Anomaly detection and root cause analysis in large-scale networks (Polito + FTW) Quality of Experience for web browsing (Eurecom) Mobile network performance issues (Telefonica) Verification and certification of service-level agreements (FUB) Content popularity and caching strategies Etc. 1. The Internet is used by different entities (end-users, operators, content providers, regulation agencies, etc. ) 2. WP 6 – Demonstration, is about showing the actual usage of m. Plane (at least) for the defined use cases 12
Other ongoing efforts for measurement frameworks n FP 7 European projects q n n Integrated Project (IP) 3 years 2 left, 16 partners, 11. 2 Meuros … is like a “m. Plane use case” “From global measurements to local management” q Specific Targeted Research Projects (STRe. P) n 3 years 2 left, 10 partners, 3. 8 Meuros n Build probes Strong similarities for a measure framework out of n the architecture core n IETF, Large-Scale Measurement of Broadband Performance (lmap) q q Standardization effort on how to do broadband measurements n Defining the components, protocols, rules, etc. It does not specifically target adding “a brain” to the system Brian Trammell ETH 13
Outline 1. m. Plane introduction 2. Monitoring CDN “Continuous analytics for traffic monitoring and applications to CDN” A. Bar, A. Finamore, I. Bermudez, L. Golab, M. Mellia, P. Casas, Submitted to IFIP Networking 2014 14
CDN makes complicated things n n Focusing on vantage point of ~20 k ADSL customers 1 week of HTTP logs (May 2012) q q Content served by Akamai CDN The ISP hosts an Akamai “preferred cache” (a specific /25 subnet) ? ? ? 15
Reasoning about the problem n n Q 1: Is this affecting specific services? Q 2: Are the variations due to “faulty” servers? Q 3: Was this triggered by CDN performance issues? Etc… How to automate/simplify this reasoning? DBStream: n Continuous big data analytics n Flexible processing language n Full SQL processing capabilities n Processing in small batches n Storage for post-mortem analysis 16
Q 1: Is this affecting a specific service? n n n NO Select the top 500 Fully Qualified Domain Names (FQDN) served by Akamai Check if they are served by the preferred cache Repeat every 5 min The anomaly is not related to individual services Services not served by the preferred cache Services hosted by the preferred cache, except during the anomaly § The two set of FQDN are “not orthogonal” § Same results extending to more than 500 FQDN 17
Q 2: Are the variations due to “faulty” servers? n n n Compute the traffic volume per IP address Check which are the active IPs during the disruption Repeat each 5 min 18 NO
Q 3: Was this triggered by CDN performance issues? n n n Compute the distribution of server elaboration time q It is the time between the TCP ACK of the HTTP GET and the reception of the first byte of the reply passive Focus on traffic of the /25 preferred subnet client server probe Compare the quartiles every. SY 5 Nmin Performance decreases right before the anomaly @6 pm CK SYN+A ACK YES!! NO!! GET ACK query processing time DATA 19
Reasoning about the problem n n n Q 1: Is this affecting only specific services? NO Q 2: Are the variations due to “faulty” servers? NO Q 3: Was this triggered by CDN performance issues? NO What else? q Other vantage points report the same problem? YES! q What about extending the time period? n The anomaly is present along the whole period we considered n On going extension of the analysis on more recent data sets (possibly exposing also other effects/anomalies) q Routing? TODO route views q DNS mapping? TODO Ripe. Atlas + ISP active probing infrastructure Other suggestions are welcomed 20
With the m. Plane hat on… n Probes q q n Other data sources: q q n Passive monitoring at the edge (i. e. , residential customers) Passive monitoring at the core (i. e. , peering links) Active monitoring (e. g. , DNS mapping, network paths, etc. ) End-users reports (e. g. , browser plugins) Routing tables Max. Mind Orgname DB / whois Methodologies: q q Anomaly detection algorithms Geolocation 21
Conclusions n m. Plane aim to simplify network monitoring practices q n First SW libraries will be released within the first half of the year Open for collaborations q Collaboration Institutions (CI) n q CAIDA, Mlab, Orange Lab Poland, Endace, etc. Other (less formal) ways are welcomed as well 22
? ? || ## Alessandro Finamore – Politecnico di Torino <alessandro. finamore@polito. it> 23
- Slides: 23