INTERNET SIMULATOR Jelena Mirkovic USC Information Sciences Institute

MAIN IDEAS • • Two pieces of the puzzle – Internet map – different

CASE 1: SPOOFING DEFENSE Evaluate all spoofing filtering defenses • These associate a source

CASE 1: SPOOFING DEFENSE • • • How accurate must a model be? –

CASE 2: WORM SIMULATION We wanted to study collaborative worm defenses: – When I’m

CASE 2: WORM SIMULATION • How accurate must a model be? • Lessons learned

GOAL DEFINITION • • Internet map + simulator There’s a lot of data about

CHALLENGES • • - - Data about Internet is incomplete in each dimension –

HOW TO SIMULATE? • Simulating only necessary events leads to great speedup: - In

WHICH SIMULATOR? • • • ns-3 seems promising: - It’s better to start from

WHERE DOES SIMULATOR RUN? • • Need a set of distributed machines: - Research

WHO’S GOING TO DEVELOP THIS? • • The core of the code should be

THANK YOU I’d love to hear your questions and comments! Jelena Mirkovic, sunshine@isi. edu

Slides: 14

Download presentation

INTERNET SIMULATOR Jelena Mirkovic USC Information Sciences Institute sunshine@isi. edu

MAIN IDEAS • • Two pieces of the puzzle – Internet map – different sources – Computational models – Simulation engine Open-source, modular – Users can replace pieces Simulation at customizable granularity – Speed vs scale, problem-dependent, zoom-in function My experience – Custom Internet simulators for IP spoofing and worm research 2

CASE 1: SPOOFING DEFENSE Evaluate all spoofing filtering defenses • These associate a source IP with some feature to detect spoofing – Previous hop, secret key, marks placed by routers … • Model the Internet, calculate number of all (source, target, spoofed address) that are filtered • No need to model attackss • That depends: – Many hosts may have the same previous hop to filter –route diversity, route changes and host distribution – Route depends on filter – [1] “Comparative Evaluationdiversity of Spoofing Defenses, ” J. Mirkovic and E. location Kissel, IEEE Transactions on Dependable andtopology Secure Computing, 2009. •

CASE 1: SPOOFING DEFENSE • • • How accurate must a model be? – Assuming random topology/routing/filter placement, uniform host distribution = extremely low effectiveness Realistic sources produce much better results 6 months for the eval framework, 3 more years to get the sources right - 6 x the overhead

CASE 2: WORM SIMULATION We wanted to study collaborative worm defenses: – When I’m infected I tell my friends what to filter • We decided to model the following in PAWS [2]: – Internet topology at the AS level – Conn. and AS size, routing – Log-normal distribution of vulnerable hosts – Worm-specific scanning strategy (RNG, scan size) – Limited link bandwidth (inter-AS, access) – Legitimate cross-traffic and its response to congestion (TCP) • This required lot of data collection/aggregation and guesswork - Especially for routing, host distribution, link bw and [2] “A Realistic Simulation of Internet-Scale Events, ” S. Wei and J. Mirkovic, Proc. of the Create-Net legitimate traffic International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS), • 2006.

CASE 2: WORM SIMULATION

CASE 2: WORM SIMULATION • How accurate must a model be? • Lessons learned from our experience: - 2 years to do the simulation right and the field has moved on – we never developed the worm defense

GOAL DEFINITION • • Internet map + simulator There’s a lot of data about the Internet – Multiple sources, multiple data types and resolutions We need a way to suck this data into ONE model of the Internet where user can select - “Age” of the Internet – time of data collection - Sources to be combined - What to do for missing data This model should either interface with popular simulators or be coupled with a simulator of its own - For this scale, the simulator must be distributed and its level of details must be customizable - Standardization of evaluation, easy portability

CHALLENGES • • - - Data about Internet is incomplete in each dimension – Each data source should have some way to estimate how far it is from reality and possibly compensate for it Data about Internet is huge - Distributed simulation - Emulation at this scale is too hard and unnecessary Missing data - Tolerate or guess? Users should choose - E. g. , extrapolate from existing data Anonymized data - Guess, users should choose how - E. g. , random guess, multiple trials

HOW TO SIMULATE? • Simulating only necessary events leads to great speedup: - In our work: 136 -CPU core running PDNS and GTNet. S matched with 8 common PCs - Fidelity/details are specific to research questions - May be able to come up with a standard for an area of research - Users must be able to customize this - Interaction between simulation and the model must also be specified

WHICH SIMULATOR? • • • ns-3 seems promising: - It’s better to start from a widely used simulator than to develop one from scratch - ns-3 corrects lots of deficiencies found in ns-2 - Easily extensible, detailed simulation of network events, lots of tools for post-processing Work needed: - Significant revision to support “variable granularity” simulations and Internet model interaction – must be able to drop pieces of code selectively - A separate, large chunk of code to model the Internet Downsides: - Two-language code base, sometimes hell to debug

WHERE DOES SIMULATOR RUN? • • Need a set of distributed machines: - Research testbeds such as Emulab and DETER seem like a good choice - Usually have shared file system, useful for initialization files - Usually have dedicated links between nodes of high capacity and low delay - Any institution can build its own mini-testbed, all code for this is open-source Each user needs his own copy of the code: - So he can customize it accordingly - There could be a centralized repository for the map where the most popular data sources are integrated

WHO’S GOING TO DEVELOP THIS? • • The core of the code should be developed by a single team: - Ability to create unified maps from disparate sources - Ability to selectively simulate events at chosen granularity - Interaction between the simulator and the map - Some straightforward “guesswork” techniques Customizations of map/simulator for specific problems should be contributed by the user community: - Open-source model, like ns-2 - Code contributed by experts for specific problems

THANK YOU I’d love to hear your questions and comments! Jelena Mirkovic, sunshine@isi. edu