Challenges and solutions for levees monitoring and flood

  • Slides: 71
Download presentation
Challenges and solutions for levees monitoring and flood decision support systems Marian Bubak Department

Challenges and solutions for levees monitoring and flood decision support systems Marian Bubak Department of Computer Science AGH University of Science and Technology Kraków, Poland SKG 2017, Beijing, China, August 14 -16, 2017

Thanks to • • • Bartosz Balis Marek Kasztelnik Daniel Harezlak Maciej Malawski Piotr

Thanks to • • • Bartosz Balis Marek Kasztelnik Daniel Harezlak Maciej Malawski Piotr Nowakowski Maciej Pawlik Bartosz Wilk Tomasz Gubala Jan Meizner • Robert Meijer • Adam Belloum • Lera Krzhizhanovskaya • • • Krzysztof Zielinski Robert Brzoza-Woch Marek Konieczny Bartosz Kwolek Piotr Nawrocki Tomasz Szydlo Andrzej Lesniak Aleksandra Borecka Barbara Bukowska-Bieliniak Klaudia Sekula Zbigniew Olszamowski Daniel Kessler … and to other colleagues from Cross. Grid, K-Wf. Grid, Urban. Flood, and ISMOP

DICE Team: our area of research • • Investigation of methods for complex scientific

DICE Team: our area of research • • Investigation of methods for complex scientific collaborative applications Elaboration of environments and tools for e. Science Integration of large-scale distributed computing infrastructures Knowledge-based approach to services, components, and their composition

DICE Team skillset and EU projects Interactive compute- and data-intensive applications, knowledge-based workflow composition,

DICE Team skillset and EU projects Interactive compute- and data-intensive applications, knowledge-based workflow composition, programming models Cross. Grid, K-Wf Grid, Core. GRID 2002 -2008 Script-based composition of applications, Grid. Space Virtual Laboratory Viro. Lab, GREDIA 2006 -2009 Federating cloud resources for VPH compute- and dataintensive applications, Data. Net – metadata models VPH-Share, PLGrid 2009 -2015 Common Information Space for Early Warning Systems, big data Urban. IFlood, storage and access, analysis tools ISMOP 2009 -2016 Computational strategies, software and services for distributed multiscale simulations MAPPER, PROCESS 2010 -2013 2017 -2020 Executable Papers; 1 st prize in Elsevier competition at ICCS 2011 (Elsevier follow-up project) Collage 2011 -2013 Optimization of workflow applications on cloud resources Paa. Sage 2013 -2016 Infrastructure for large-scale simulations in medicine Eur. Valve 2016 -2019 Toward Centre of Excellence for personalised medicine Teaming - CECM 2017 -2018 http: //dice. cyfronet. pl/ ; http: //dice-cyfronet. github. io/#history

This talk will address § Methods of building complex distributed systems and applications (decision

This talk will address § Methods of building complex distributed systems and applications (decision support systems) § Coordination of execution (workflow) § Monitoring and management of services § Optimal allocation of resources to services, accommodation to ”spiky” behavior (variable workload)

Outline • • • Motivation Interactive system (person in a loop) Exploitation of knowledge

Outline • • • Motivation Interactive system (person in a loop) Exploitation of knowledge Need for a decision support system Building early warning systems IT system for smart levee monitoring and flood decision support Analysis of an urgent computing scenario (on cloud) Holistic approach to urgent computing management In quest of the operational database Summary

Motivation: Krakow, May 2010

Motivation: Krakow, May 2010

Flood - Cross. Grid (2002 - 5) Cross. Grid: Development of Grid Environment for

Flood - Cross. Grid (2002 - 5) Cross. Grid: Development of Grid Environment for Interactive Applications http: //cordis. europa. eu/project/rcn/63588_en. html ftp: //ftp. cordis. europa. eu/pub/ist/docs/grids/crossgrid_achievement. pdf L. Hluchy, V. D. Tran, O. Habala, B. Simo, E. Gatial, J. Astalos, M. Dobrucky: Flood Forecasting in Cross. Grid Project, in Marios D. Dikaiakos (Eds): Grid Computing Second European Across. Grids Conference, Ax. Grids 2004, Nicosia, Cyprus, January 28 -30, 2004. Revised Papers, LNCS 3165, 5160, 2004 This paper presents a prototype of flood forecasting system based on Grid technologies. The system consists of workflow system for executing simulation cascade of meteorological, hydrological and hydraulic models, data management system for storing and accessing different computed and measured data, and web portals as user interfaces. The whole system is tied together by Grid technology and is used to support a virtual organization of experts, developers and users.

Flood - K-Wf. Grid (2004 -7) K-Wf. Grid: Knowledge-based workflow system for Grid applications

Flood - K-Wf. Grid (2004 -7) K-Wf. Grid: Knowledge-based workflow system for Grid applications http: //cordis. europa. eu/publication/rcn/9410_en. html ftp: //ftp. cordis. europa. eu/pub/ist/docs/grids/k-wf-grid-interim-sheet_en. pdf Ladislav Hluchý, Ondrej Habala, Martin Maliska, Branislav Simo, Viet D. Tran, Ján Astalos, Marian Babik, Grid Based Flood Prediction Virtual Organization. e-Science 2006, 4 -6 December 2006, Amsterdam This paper describes evolution of a flood prediction application. The application consists of a set of simulation models, visualization tools, and various support components. During six years it has evolved from a simple hydraulic modeling scenario into a sophisticated cascade of simulations, using state-of-the art grid, workflow and knowledge management technologies, and is one of the first applications of the SOKU concept in the field of computer simulations.

From IJdijk to Urban. Flood (2008) http: //www. floodcontrolijkdijk. nl/en/ Robert Meijer, TNO ICT

From IJdijk to Urban. Flood (2008) http: //www. floodcontrolijkdijk. nl/en/ Robert Meijer, TNO ICT Groningen and University of Amsterdam The IJkdijk consortium turns to 7 FP to organize research on the development of • Geo. Sensing technology • Sensor network telecommunication systems • Sensor data processing facilities • Smartness in sensors (sensor plug and play, data acquisition).

Monitoring and decisions Stand-by mode • Monitoring data collection (low frequency) • Initial on-line

Monitoring and decisions Stand-by mode • Monitoring data collection (low frequency) • Initial on-line analysis (trends, deviations in sensor readings) • Presentation of external info: weather prediction, flood wave prediction, etc. Threat assessment mode • Increased frequency of sensor data collection • Resourceintensive threat level evaluation Alert mode • Prediction of levee behavior • Notification of authorities

Decision support workflow Multiple tiers of data analysis, activated as alert level rises •

Decision support workflow Multiple tiers of data analysis, activated as alert level rises • Anomaly detection: 1 st indication about a potential problem • Threat estimation: more in-depth analysis of anomalous sections in order to assess threat of levee failure (e. g. levee breach simulations) • Risk assessment: estimate impact of levee breach (e. g. simulations of inundation, crowd behavior)

Common Information Space A platform facilitating development, deployment and execution of Early Warning Systems

Common Information Space A platform facilitating development, deployment and execution of Early Warning Systems • EWS development – EWS reference model – EWS development framework • EWS deployment – EWS blueprints – EWS-factory-as-a-service • EWS execution – CIS runtime services for resource allocation, self-monitoring, self -healing, mission-critical operation, and urgent computing

Common Information Space Domain resources exposed as Basic Services Data, sensors, apps wrapped as

Common Information Space Domain resources exposed as Basic Services Data, sensors, apps wrapped as appliances and deployed onto clouds, … Composite Services (Parts) Early Warning System Building blocks for EWSs Orchestrate domain resources towards complex application scenarios (e. g. area flood simulation) A number of Parts deployed, connected, and configured for a specific setting (e. g. a dike section)

Flood EWS with CIS

Flood EWS with CIS

CIS as a system factory • On-demand resource provisioning (local resources, clouds) • Horizontal

CIS as a system factory • On-demand resource provisioning (local resources, clouds) • Horizontal scaling of infrastructure (more instances) • Load balancing with lazy evaluation • On-line availability monitoring • Notifications about problems • Automatic restart of failed components

Polish ISMOP Project - Motivation • Predict levee failures using smart levees • Typical

Polish ISMOP Project - Motivation • Predict levee failures using smart levees • Typical flood scenario in Lesser Poland region: flood threat due to passing water wave • High water levels lasting up to several weeks • Need for urgent computing in large-scale scenarios

Towards a levee monitoring system ISMOP: elaboration of a system for smart levee monitoring

Towards a levee monitoring system ISMOP: elaboration of a system for smart levee monitoring and flood decision support • Construction of an experimental levee • Design and installation of sensors • Innovative telemetry system • Model- and data-driven modeling of levee behavior • Monitoring and decision support system

ISMOP experimental smart levee • size: 200 m x 50 m x 4 m,

ISMOP experimental smart levee • size: 200 m x 50 m x 4 m, 4 types of material, • about 1500 sensors of temperature and pore pressure • about 2000 sensors in two optic fibre sensors

Experimental levee in construction (14. 08. 2015)

Experimental levee in construction (14. 08. 2015)

ISMOPConsortium • Department of Computer Science AGH • Department of Hydrogeology and Engineering Geology

ISMOPConsortium • Department of Computer Science AGH • Department of Hydrogeology and Engineering Geology AGH • Department of Geoinformatics and Applied Computer Science AGH • Neo. Sentio, Kraków • Sweco Hydroprojekt Kraków in collaboration with the Czernichów Community http: //www. ismop. edu. pl/

Assessment of levee breach threat via scenario matching

Assessment of levee breach threat via scenario matching

ISMOP IT system for smart levee monitoring and flood decision support

ISMOP IT system for smart levee monitoring and flood decision support

ISMOP central system use cases • Support for experiments on the artificial levee –

ISMOP central system use cases • Support for experiments on the artificial levee – Controlled flooding of the artificial levee and on-line data collection – Validation of models of levees • Elaboration of a decision support system – Continuous monitoring of levees – Automation data-driven and model-driven analyses – Prediction of breaches

ISMOP Decision Suport System

ISMOP Decision Suport System

Challenges: visualization and decision support Interoperability with external systems (e. g. ISOK, regional flood

Challenges: visualization and decision support Interoperability with external systems (e. g. ISOK, regional flood protection agencies) • Solution • Leveraging open standards (OGC, INSPIRE) for data & metadata models Visualization of relevant information to effectively • Solution: visualisation system of DSS support the decision making process Adaptability to other domains (e. g. monitoring of communication infrastructure) • Solution • Open domain-agnostic design (metadata and public APIs design are crucial)

Challenges: execution management Scale up to 100 s 1000 s kilometers of levees •

Challenges: execution management Scale up to 100 s 1000 s kilometers of levees • Solution • Monitored area divided into sections • Managed by multiple instances of a Monitoring Application, dynamically deployed on-demand Highly variable resource demands: • Solution from very low in • Dynamic provisioning of resources from private or public clouds standby mode to high in threat • Autoscaling algorithms and policies assessment mode

Challenges: data management Diverse data sets (spatial, time series, binary, metadata) and data usage

Challenges: data management Diverse data sets (spatial, time series, binary, metadata) and data usage patterns • Solution • Multiple data stores and models to address diverse needs Data-intensive processing Threat level evaluation scenario: up to 130 GB of data to search per 1 km of a levee • Solution • Big data infrastructure • Map-Reduce data search

Urgent computing scenario Goal: Assess flood risk for a large set levees by a

Urgent computing scenario Goal: Assess flood risk for a large set levees by a specified deadline, compute results by a specified deadline, provide levee health maps for all levee sections Solution: dynamic provisioning of cloud resources • A user: – Target area for flood threat assessment – Time window size for current measurements – Deadline to get results • The system: – Generates workflow representing all required computations and data dependencies – Plans the workflow execution so as to meet the deadline – Runs the workflow – Monitors its execution and reconfigures resource allocation if needed

Levee breach threat assessment

Levee breach threat assessment

Implementation of urgent computing

Implementation of urgent computing

Resource provisioning model • Bag-of-tasks model – Selection of dominating tasks – Uniform task

Resource provisioning model • Bag-of-tasks model – Selection of dominating tasks – Uniform task runtimes • Performance model: T = f (v, d, s, …) – – T – total computing time v – number of VMs d – time window in days s – number of tasks (sections)

Resource provisioning model •

Resource provisioning model •

Simulations • Setup: private cloud infrastructure – a node with 8 cores (Xeon E

Simulations • Setup: private cloud infrastructure – a node with 8 cores (Xeon E 5 -2650) – virtual machines (1 VCPU, 512 MB RAM) – data for simulated scenarios (244 MB total) on local disks • Simulations: – 1 -1024 sections – 1 -16 VMs – 1 -7 days time window • Warmup tasks:

Resource provisioning - results Warmup { • Warmup tasks clearly separated as outliers •

Resource provisioning - results Warmup { • Warmup tasks clearly separated as outliers • Linear functions • Parameters a, b, c determined using non-linear fit • The model fits well to the data a = 6. 53 b = 9. 41 c = 31. 71 1024 sections 128 sections

Clouds for urgent computing (1/2) • Elasticity – On-demand provisioning of VMs – Job

Clouds for urgent computing (1/2) • Elasticity – On-demand provisioning of VMs – Job prioritization and preemption • Reliability – Public cloud services are specifically designed to support systems with high availability demands – Amazon: only five major outages in the years 2010 -2013 (only one for more than 6 h) • Safety – Serious natural disaster may damage a local computing infrastructure – Public clouds as an emergency computing infrastructure – Data safety: public clouds as a reliable storage infrastructure for important but not sensitive data (example: pre-simulated scenarios data sets)

Clouds for urgent computing (2/2) • Cost-effectiveness – Decision support systems for natural disasters

Clouds for urgent computing (2/2) • Cost-effectiveness – Decision support systems for natural disasters generate ‘spiky’ workloads: perfect cloud use case – Cheaper than maintenance of a dedicated infrastructure – Day-to-day operation can be handled by a relatively small, low-cost on-premises infrastructure • Performance? – Bag-of-tasks applications such as scenario identification perfectly fit the cloud – What about CPU- and communication-intensive tightly -coupled simulations? HPC-in-the-Cloud is an emerging trend.

Holistic approach to urgent computing management System reconfiguration loop Isolated approach Holistic approach

Holistic approach to urgent computing management System reconfiguration loop Isolated approach Holistic approach

System management • SLA (service-level agreement): requried level of quality of service the system

System management • SLA (service-level agreement): requried level of quality of service the system has to deliver • SLAs become optimization objectives • SLAs vary depending on system mode – Normal mode: conserve energy, save costs – Urgent mode: high performance, high accuracy of data measurement and analysis, deadlines

IT system for smart levees: reference architecture and objective functions OPC (Operating Cost) SLT

IT system for smart levees: reference architecture and objective functions OPC (Operating Cost) SLT (System Lifetime) EE (Energy Efficiency) DPI (Data Processing Interval) DPT (Data Processing Time) DAT (Data Access Time) DTI (Data Transmission Interval) DMI (Data Measurement Interval)

Configurable properties

Configurable properties

Service profiles in normal and urgent modes Service profile defines: • SLAs: functional constraints

Service profiles in normal and urgent modes Service profile defines: • SLAs: functional constraints imposed upon the objective functions • Trade-offs: which objectives should be preferred?

Objective functions models • Based on experimental research and literature review, we have created

Objective functions models • Based on experimental research and literature review, we have created approximate models of three objective functions: – Function q 1: OPC (Operating Cost) – Function q 2: EE (Energy Efficiency) – Function q 3: TML (Timeliness – aggregated measure of system performance and responsiveness) where qi: (s, c) [0, 1] s: system configuration (a vector of configuration options for all system components) c: system context (e. g. weather conditions) 0: minimal possible value, 1: maximal possible value

Decision support workflow: implementation

Decision support workflow: implementation

Holistic urgent computing management: algorithm

Holistic urgent computing management: algorithm

Results Pareto-optimal solutions, normal mode

Results Pareto-optimal solutions, normal mode

Results Pareto-optimal solutions, urgent mode

Results Pareto-optimal solutions, urgent mode

In quest of the operational database • Research context: DSS, urgent computing • Methodology:

In quest of the operational database • Research context: DSS, urgent computing • Methodology: data sets and workloads • Models for time series data for 4 databases: Mongo. DB, Postgre. SQL, Redis, and Influx. DB • Experimental evaluation: – Performance of different models – Mixed workload performance

Challenges for data management Diverse data sets (spatial, time series, binary, metadata) and data

Challenges for data management Diverse data sets (spatial, time series, binary, metadata) and data usage patterns • Solution: multiple data stores and models to address diverse needs Data-intensive processing Threat level evaluation scenario: 130 GB and more of data to search per 1 km of a levee • Solution: appropriate data infrastructure

Motivation and objectives • Environmental monitoring and decision support systems need to process massive

Motivation and objectives • Environmental monitoring and decision support systems need to process massive sensor data streams in real time • Simultaneous reads and writes • Objective of the study: evaluate four different DBs and corresponding data models: Mongo. DB, Postgre. SQL, Redis, Influx. DB – How to best represent time series data? – What are the limits of the evaluated DBs?

Research context Data management in large-scale environmental monitoring, early warning, and decision support systems

Research context Data management in large-scale environmental monitoring, early warning, and decision support systems

Applied methodology Four database technologies chosen: Mongo. DB, Postgre. SQL, Redis, Influx. DB Research

Applied methodology Four database technologies chosen: Mongo. DB, Postgre. SQL, Redis, Influx. DB Research questions: • How best to implement time series data in a given data model and database? • How do alternative models perform for different queries? • What are the reasonable volume limits for an operational database? • What are the performance limits of the alternative approaches and what factors influence them?

Methodology: test data sets • Time series records consisting of (time series id, time

Methodology: test data sets • Time series records consisting of (time series id, time stamp, value) • Generated data sets representing measurements from 10, 000 sensors • For experiments, databases were populated with from 10 M to 1 B records

Methodology: test workloads Read workload: three test queries • Query 1: random access. Return

Methodology: test workloads Read workload: three test queries • Query 1: random access. Return 1000 records for random time series IDs and time stamps. This represents a query which is difficult to optimize. It may occur in certain types of visualizations spanning many sensors. • Query 2: recent measurements. Return 10 latest records for 100 random time series IDs. • Query 3: downsampling. Return 100 records for 100 random time series IDs, where the returned records are selected by downsampling the latest n*100 records for each of the time series IDs. Write workload: • 10, 000 new records per second • Written to DBs in batches of 1000 -100, 000

Databases • • Mongo. DB: document database Postgre. SQL: relational database Redis: in-memory dictionary

Databases • • Mongo. DB: document database Postgre. SQL: relational database Redis: in-memory dictionary data server Influx. DB: native time series database

Data models: Mongo. DB • Model 1: one record = one document custom. Id

Data models: Mongo. DB • Model 1: one record = one document custom. Id and timestamp should be indexed to improve query performance • Model 2: single document = multiple records (e. g. 1 hour records) Less documents in database Documents can be pre-created

Data models: Postgre. SQL • Model 1: single monolithic table with three columns (id,

Data models: Postgre. SQL • Model 1: single monolithic table with three columns (id, time stamp, value) • Model 2: partitioned table – Time series ID as the partition key – Increases DB scalability but introduces write overhead • Model 3: multi-column table (not implemented) – One row: id, time stamp, multiple values) – Smaller tables – Queries more difficult to implement

Data models: Redis • Model 1: one record = one Redis HASH • Model

Data models: Redis • Model 1: one record = one Redis HASH • Model 2: one record = one Redis STRING • Model 3: SORTED SETS – Elements of the set = values – Name of the set = time series ID – Score associated with values = time stamp

Data models: Influx. DB • Only one model: Influx. DB’s native representation – One

Data models: Influx. DB • Only one model: Influx. DB’s native representation – One record = one point in a time series tagged with time series ID (indexed)

Write performance and disk usage • Redis achieves best write throughput, Influx. DB almost

Write performance and disk usage • Redis achieves best write throughput, Influx. DB almost as good • Influx disk space optimization is excellent • Redis consumes surprisingly high amount of memory (10 M records on 4 GB machine)

Query execution • • Clear advantage of Mongo M 2 over M 1 Partitioning

Query execution • • Clear advantage of Mongo M 2 over M 1 Partitioning induces overhead for small DB sizes but for large DBs performance gain is larger • • Redis M 3 performs best, but not for Q 1 Influx has excellent performance and exceptional scalability

Mixed workload performance • Influx performs almost equally well for 1 B records than

Mixed workload performance • Influx performs almost equally well for 1 B records than Redis for 10 M • Mongo. DB 8 -12 s response times may not be sufficient • Mixed workload affects Postgre. SQL the most due to complex table locking and index updates (ACID compliance)

Database quest - conclusion • Native time series Influx DB outperforms the competition •

Database quest - conclusion • Native time series Influx DB outperforms the competition • However, there are other factors in choosing technology than just performance • Sometimes one chooses “boring technology”(*) because it’s more predictable, easier to operate and maintain • One may choose to store time series data in the same DB as metadata to keep technology stack small • For less demanding use cases even good old RDB may prove sufficient (*) http: //mcfunley. com/choose-boring-technology

Threat detection with data analysis trend anomaly

Threat detection with data analysis trend anomaly

Threat detection via numerical modeling • Prediction of levee response to certaing water levels

Threat detection via numerical modeling • Prediction of levee response to certaing water levels in current external conditions • Numerical models require geotechnical and geophysical levee examination

Controlled flooding experiments

Controlled flooding experiments

Summary • Environmental models results in complex applications – collaborative – time-critical – with

Summary • Environmental models results in complex applications – collaborative – time-critical – with data and resource intensive scenarios • Compositions of resources into complex scenarios requires appropriate optimization methods • Holistic urgent computing management results in a better optimized system configuration than the isolated approach • Proper time series representation is crucial for performance

Related publications (1/2) • B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala,

Related publications (1/2) • B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala, P. Nowakowski, J. Broekhuijsen, The Urban. Flood Common Information Space for Early Warning Systems, ICCS 2011: 96 -105 • B. Balis , R. Brzoza-Woch, M. Bubak, M. Kasztelnik, B. Kwolek, P. Nawrocki, P. Nowakowski, T. Szydlo, K. Zielinski, Holistic approach to management of IT infrastructure for environmental monitoring and decision support systems with urgent computing capabilities, FGCS, Accepted 13 August 2016, http: //dx. doi. org/10. 1016/j. future. 2016. 08. 007 • R. Brzoza-Woch, M. Konieczny, P. Nawrocki, T. Szydlo, K. Zielinski, Embedded systems in the application of fog computing - Levee monitoring use case. SIES 2016: 238 -243

Related publications (2/2) • B. Baliś, T. Bartynski, M. Bubak, D. Harezlak, M. Kasztelnik,

Related publications (2/2) • B. Baliś, T. Bartynski, M. Bubak, D. Harezlak, M. Kasztelnik, M. Malawski, P. Nowakowski, M. Pawlik, and B. Wilk, Smart levee monitoring and flood decision support system: reference architecture and urgent computing management, Procedia Computer Science 108 C (2017) 2220– 2229, 10. 1016/j. procs. 2017. 05. 192 • B. Baliś, M. Bubak, D. Harezlak, P. Nowakowski, M. Pawlik, and B. Wilk, Towards an operational database for real-time environmental monitoring and early warning systems, Procedia Computer Science 108 C (2017) 2250– 2259, 10. 1016/j. procs. 2017. 05. 193 • Two sessions dedicated to the ISMOP results at CGW 16 – October 2017 http: //www. cyfronet. krakow. pl/cgw 16/programme. html

More at http: //www. urbanflood. eu http: //www. ismop. edu. pl http: //dice. cyfronet.

More at http: //www. urbanflood. eu http: //www. ismop. edu. pl http: //dice. cyfronet. pl bubak@agh. edu. pl

Acknowledgement This research was supported by the National Centre for Research and Development (NCBi.

Acknowledgement This research was supported by the National Centre for Research and Development (NCBi. R) under Grant No. PBS 1/B 9/18/2013.