Flexible and Adaptive Processing of Earth Observation Data
Flexible and Adaptive Processing of Earth Observation Data over HPC Architectures Dorian Gorgan Computer Science Department Technical University of Cluj-Napoca http: //users. utcluj. ro/~gorgan dorian. gorgan@cs. utcluj. ro
Contents o Big data and Earth Observation data o Satellite image data processing types o Big. Earth platform o Process Description Languages o Wor. De. L language o Wor. De. L vs. MOML o Wor. De. L vs. Python Scripting o Experiments on Wor. De. L o KEOPS - Kernel Operators o Big. Earth project International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 2
Big Data o Huge data globally available n 2012, 2. 8 x 1021 Bytes (2. 8 ZB) at global level, 10 x 2007 n 2020, 40 ZB (~14 x 2012) o 3% marked/annotated, 0. 5% analyzed o Big Data – volume, variety, velocity, variability, veridity o Earth Observation Data (EO Data) o High costs of data management o Increase data value by using instead of just storing o Data -> knowledge -> information o High Performance Computation resources + Analytics International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 3
Data, knowledge and information o Flexible description and adaptive processing International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 4
Satellite Image Data Processing Types o o Custom processing within the same input image n Earth data processing for retrieving information about specific geographic areas that are enclosed within the same image extent n Entire image vs. selected areas (bounding box, shapefiles) Indexed items processing n o Information is defined based on structured models, such as matrices or arrays, where each element has a known position Unindexed items processing International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 5
Processing within Custom Area of Interest o Specify one area of interest vs. multiple areas of interest n E. g. Compare the vegetation growth over years in these regions, by computing the Normalized Difference Vegetation Index (NDVI) for each particular area n E. g. Change tracking, Urban evolution over years. International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 6
Indexed Items Processing o Order a set of items by indexing them (e. g. list of satellite images, satellite data) o Decompose the satellite image within equal inner areas o Sub-areas are processed individually to extract information International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 7
Unindexed Items Processing o Items with unknown order are computed on the fly o E. g. Different processing types that collect information from remote data feeders providing access to unindexed resources GIS_ALG International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 8
Big. Earth Platform International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 9
Big. Earth Architecture o o Wor. De. L Parser n Communication between user and system n Interprets the processes description n Input and output description KEOPS Repository n KEOPS (Kernel Operators) is a collection of all operators which can be accessed by the process International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 10
Big. Earth Architecture o o Planner n Interprets data provided by the user n Identifies relationships between atomic operators n Identifies the parallel and sequential tasks Executor n Prepares the inputs and launches the tasks into execution n Manages pools of execution machines n Identifies the parallel and sequential tasks n Launches in execution parts of the processing chain given by the user International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 11
Process Description Languages o o BPEL, WS-BPEL, BPEL 4 WS (Business Process Execution Language for Web Services) n BPEL is OASIS standard (OASIS - Organization for the Advancement of Structured Information Standards) n Used in linking up web services=> creating more advanced entities, dubbed business processes n XML-based format MOML (Modeling Markup Language) n XML-based format n Possibility of defining relationships between entities without making any assumptions with regard to their meaning and interpretation International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 12
Wor. De. L Language o Workflow Description Language (Wor. De. L) o Describes the processing algorithms n Compact format n Intuitive n Allows the identification of parallelizable algorithm sections n Flexibility – easy to use, increase the efficiency International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 13
Wor. De. L Language International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 14
Operators and Connections o o Operator n Self-contained piece of software developed for a specific functionality n Input and output ports (interface) Connection n Link between two or many operators n One to many operators n Data type compatibility [ a, b] SUM: op 1 [r] [ a, r ] DIFF: op 2 [result] International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 15
Workflow o Network of interconnected operators o Process (algorithm) description o Hyperstructure International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 16
Data Types o Basic types: String, Integer, Float, Boolean, File n File type: checked just at execution (e. g. satellite images) E. g. Landsat, MODIS, etc. o Aggregate types: List, Tuple n List: homogeneous collection of data E. g. List ( <element_type> ) n Tuple: heterogeneous collection of data E. g. Tuple ( Integer, Float, String ) Tuple ( String, List ( Integer ) ) Tuple ( List…. . ) ) International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 17
Special Constructs o o Decisional constructs n Switch n Join Example: res = a+b, if a<b res = a-b, if a>b International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 18
Special Constructs - Example o Example: res = a+b, if a<b res = a-b, if a>b International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 19
Special Constructs o Repetitive constructs: For-each n Repetitive processing on different data [ a : list 1, b : list 2 ] foreach ( ) [ r : result ] [ a, b ] SUM: op 1 [ r ] end International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 20
Program Structure o Include section – includes external sections/ files o Definition section – workflow definition o Process section – processing tasks Include section Interface definition Workflow body Workflow Description section Processing section International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 21
Wor. De. L vs. MOML o Example: sum of two integers o Compact form of Wor. De. L MOML Wor. De. L International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 22
Wor. De. L vs. Python Scripting o Compute: E=(a+b)+(a-b) o Use executable programs for each arithmetic operation; Store the result in a given file – parameter of the command line Python Wor. De. L International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 23
Wor. De. L vs. Python Scripting o o Python: n Python provides a programming-centered approach for defining data processing algorithms n Users - full responsibility for the intermediate files management Wor. De. L: n Specifically for describing a network of interconnected entities. Has a straight-forward, compact format, based on a “black-box” approach n The entities might well be Python scripts themselves n Wor. De. L looks less like a programming language and more like a format for representing workflows n Users don’t need to manage files or invoke the operators International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 24
Experiments on Wor. De. L o Integrating Wor. De. L within the Big. Earth platform o Computation of: E = (a+b) + (a-b) + a*b + a/b o Experimental considerations: n a, b, E: integers n Computation time estimated as: “+” (2 sec), “–” (2 sec), “*” (3 sec), “/” (4 sec) o Objectives: n Functionality n Computation time n Parallelism International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 25
Experiments: Process Workflow E = ((a+b) + (a-b)) + ((a*b) + (a/b)) International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 26
Experiments: Wor. De. L Description E = (a+b) + (a-b) + a*b + a/b International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 27
Experiments: Task Execution Diagram E = (a+b) + (a-b) + a*b + a/b ms 8, 730 ms of the parallel execution vs 17, 716 ms of the sequential execution. International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 28
Conclusions on Wor. De. L Language o Compact o Intuitive o Flexible o Integration within the Big. Earth platform o Simple way of defining data processing algorithms o Distribution and execution within a HPC infrastructure o Efficient processing big geospatial data International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 29
KEOPS – Kernel Operators o Collection of atomic functionalities used to develop EO oriented complex processing n Algorithms for computing the greenness of specific geographic locations, arithmetic operations applied on EO data (e. g. adding a constant to a satellite image), operators for managing satellite bands (e. g. mosaic, crop, etc. ) o KEOPS is based on GRASS (Geographic Resources Analysis Support System) o KEOPS was developed initially within Green. Land platform for geospatial data management, satellite image processing, interactive visualization, etc. o Built in system within the Big. Earth platform International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 30
Experiments on KEOPS and Wor. De. L o Combines two operators: NDVI (Normalized Difference Vegetation Index) and Density Slicing International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 31
Big. Earth Project Funded by ROSA (Romanian Space Agency) and ESA (European SA) Big. Earth project, http: //cgis. utcluj. ro/bigearth/ o Description n Explore techniques and methodologies to develop and execute analytics on Big Earth Data n Provide flexible and interactive description of HPC processing (High Performance Compution) n Adaptive HPC based computation o Focus on n Access massive data, knowledge, and information n User access to simple and complex processing algorithms n Processing scheduling n Execution performance o HPC platforms n Grid, Cloud, Multicore, GPU Clusters International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 32
Publications 1. Gorgan D. , Mihon D. , Bacu V. , Rodila D. , Stefanut T. , Colceriu V. , Allenbach K. , Balcik F. , Giuliani G. , Ray N. , Lehmann A. , Flexible Description of Earth Data Processing over HPC Architectures, Big Data From Space Symposium, Frascati, Italy, 5 -7 June, Abstract Book, pp. 39 (2013). 2. Nandra C. I. , Gorgan D. , Workflow Description Language for defining Big Earth Data Processing Tasks, Proceedings of the ICCP-2015 Conference (in press). 3. Gorgan D. , Giuliani G. , Ray N. , Cau P. , Abbaspour K. , Charvat K. , Jonoski A. , Lehmann A. , Black Sea Catchment Observation System as a Portal for GEOSS Community, in International Journal of Advanced Computer Science and Applications (IJACSA), pp. 9 -18, (2013). 4. Mihon D. , Bacu V. , Colceriu V. , Gorgan, D. , Modeling of Earth Observation Use Cases through KEOPS System, Proceedings of the ICCP-2015 Conference (in press). 5. Bacu V. , Stefanut T. , Gorgan D. , Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description, Proceedings of the ICCP-2015 Conference (in press). International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 33
ACKNOWLEDGMENTS o This research is supported by ROSA (Romanian Space Agency) by the Contract CDI-STAR 106/2013, BIGEARTH -Flexible Processing of Big Earth Data over High Performance Computing Architectures. o The scientific consultancy and technology transfer has been supported by MEN-UEFISCDI by Contract no. 344/2014, PECSA Experimental High Performance Computation Platform for Scientific Research and Entrepreneurial Development. International Conference and Exhibition on Satellite, 7 -19 August, 2015, Houston, USA 34
Many thanks for your attention! Dorian Gorgan Computer Science Department Technical University of Cluj-Napoca http: //users. utcluj. ro/~gorgan dorian. gorgan@cs. utcluj. ro
- Slides: 35