Developing open source GIS what are the challenges

  • Slides: 45
Download presentation
Developing open source GIS: what are the challenges? Gilberto Câmara INPE – Brasil www.

Developing open source GIS: what are the challenges? Gilberto Câmara INPE – Brasil www. terralib. org Institute for Geoinformation – TU Wien – 16 June 2004

The Promise of Open Source n n When an OSS project reaches a “critical

The Promise of Open Source n n When an OSS project reaches a “critical size” we obtain many benefits Robustness ¨ n Cooperation ¨ n ``Given enough eyeballs, all bugs are shallow. '' ``Somebody finds the problem and somebody else understands it'‘ (Linus Thorvalds) Continuous Improvement ¨ “Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging”

Naïve view of open source projects n Software Product of an individual or small

Naïve view of open source projects n Software Product of an individual or small group (peer-pressure) ¨ Based on a “kernel” with “plausible promise” ¨ n Development network ¨ n Open source products ¨ n Large number of developers, single repository View as complex, innovative systems (Linux) Incentives to participate Operate at an individual level (“self-esteem”) ¨ Wild-west libertarian (“John Waynes of the modern era”) ¨

Idealized model of OS software Networks of committed individuals

Idealized model of OS software Networks of committed individuals

The Reality of Open Source n Previous existence of conceptual designs of similar products

The Reality of Open Source n Previous existence of conceptual designs of similar products (the potential for reverse engineering) ¨ n Design is the hardest part of software (Fred Brooks) Problem granularity (the potential for distributed development) ¨ Effective peer-production requires high granularity

Potential for Reverse Engineering n Post-mature A private company develops a software product. ¨

Potential for Reverse Engineering n Post-mature A private company develops a software product. ¨ Product becomes popular and it becomes part of the “public commons”. ¨ Others develop a public domain equivalent (e. g. , Open Office) ¨ n Standards-led ¨ ¨ ¨ Standards consolidate a technology Allow compatible solutions to compete in the marketplace. SQL database standard (e. g. , my. SQL and Postgre. SQL). POSIX standard (guidance to Linux) Open. GIS specifications (e. g. , Degree, Map. Server, Geo. Server)

Potential for Distributed Development n Parts of a software product ¨ n kernel and

Potential for Distributed Development n Parts of a software product ¨ n kernel and additional functions that use it (its periphery). Operating systems (Linux) well-defined kernel for process control ¨ periphery consisting of programs such as device drivers, applications, compilers and network tools. ¨ n Database management systems strong kernel of highly integrated functions (such as the parser, scheduler, and optimizer) ¨ much smaller periphery. ¨

Potential for Distributed Development n Each type of software product - periphery/kernel ratio ¨

Potential for Distributed Development n Each type of software product - periphery/kernel ratio ¨ n Kernel ¨ n a tightly-organized and highly-skilled programming team. Periphery ¨ n constrains the potential for distributed development More widespread programmers of various skills Example ¨ Out of more than 400 developers, the top 15 programmers of the Apache web server contribute 88% of added lines [Mockus, 2002 #2293].

Four Types of Open Source Software n High reverse engineering, high distribution potential n

Four Types of Open Source Software n High reverse engineering, high distribution potential n High reverse engineering, low distribution potential n Low reverse engineering, high distribution potential n Low reverse engineering, low distribution potential

Type 1 – High-High n High reverse engineering, high distribution potential: n Archetypical open

Type 1 – High-High n High reverse engineering, high distribution potential: n Archetypical open source projects ¨ n The “Linux” model. Developers May have a separate job ¨ Time allocated in agreement with their employer. ¨ n community-led projects.

Type 2 – High-Low n High reverse engineering, low distribution potential n Large number

Type 2 – High-Low n High reverse engineering, low distribution potential n Large number of projects ¨ n Large presence of private companies ¨ ¨ ¨ n products similar to market leaders. reduced risk in reverse engineering. main design decisions take place within the institution Examples ¨ ¨ n Databases, office automation tools, web services. my. SQL and Postgre. SQL DBMS, GNOME from Ximian corporation-led projects.

Type 3 – Low/High n Low reverse engineering, high distribution potential n Stable kernel,

Type 3 – Low/High n Low reverse engineering, high distribution potential n Stable kernel, innovative periphery usually there is no commercial counterpart ¨ share a relatively simple software kernel ¨ n Origin ¨ n Examples ¨ n academic environments GRASS GIS software and the R suite of statistical tools. collaborative projects

Type 4 – Low/Low n Low reverse engineering, low distribution potential n Innovative kernel,

Type 4 – Low/Low n Low reverse engineering, low distribution potential n Innovative kernel, small periphery n Small teams under a public R&D contract addressing specific requirements ¨ aiming to demonstrate novel scientific work. ¨ n High mortality rate ¨ n most of them are restricted to the lifetime of a research grant. innovative products.

High-Low Potential Rev Eng High-High my. SQL Open. Office Linux Postgre. SQL perl Apache

High-Low Potential Rev Eng High-High my. SQL Open. Office Linux Postgre. SQL perl Apache Postgres NCSA browser Low-Low GRASS R Low-High Potential Distrib Develop

High-Low Potential Rev Eng High-High corporate communitary innovative collaborative Low-Low Low-High Challenges? Potential Distrib

High-Low Potential Rev Eng High-High corporate communitary innovative collaborative Low-Low Low-High Challenges? Potential Distrib Develop

Lessons from Open Source Projects n “It's fairly clear that one cannot code from

Lessons from Open Source Projects n “It's fairly clear that one cannot code from the ground up in bazaar style. One can test, debug and improve in bazaar style, but it would be very hard to originate a project in bazaar mode. Linus didn't try it. Your nascent developer community needs to have something runnable and testable to play with” (Eric Raymond)

Moving from the Low-Low Quadrant n Software in the “Low-Low” quadrant ¨ n Unsustainable

Moving from the Low-Low Quadrant n Software in the “Low-Low” quadrant ¨ n Unsustainable in the long run Moving from an innovative to a collaborative project Sharing innovation ¨ Transforming a crude prototype into a modular, well designed system ¨ n How do you build innovation into a modular design?

Moving from the Low-Low Quadrant n “Perfection in design is achieved not when there

Moving from the Low-Low Quadrant n “Perfection in design is achieved not when there is nothing more to add, but rather when there is nothing more to take away”. (Saint-Exupery) n How do you achive perfection in information science? Good scientific foundation ¨ Usually, sound mathematical abstractions ¨ n What is the situation in GIS?

Do we have a solid foundation for GIS? id name year selection projection cartesian

Do we have a solid foundation for GIS? id name year selection projection cartesian prod union difference SELECT name FROM faculty WHERE year > 1960 relations relational algebra SQL query language Operations on ST types Spatio-temporal data types Spatial algebra ? GIS language

Challenges for geoinformation Source: Gassem Asrar (NASA)

Challenges for geoinformation Source: Gassem Asrar (NASA)

The Road Ahead: Smart Sensors SMART DUST Autonomous sensing and communication in a cubic

The Road Ahead: Smart Sensors SMART DUST Autonomous sensing and communication in a cubic millimeter Source: Univ Berkeley, Smart. Dust project

Knowledge gap for spatial data source: John Mc. Donald (MDA)

Knowledge gap for spatial data source: John Mc. Donald (MDA)

What’s the Current Status of Open Source GIS? n High-Low products Standards-based ¨ Spatial

What’s the Current Status of Open Source GIS? n High-Low products Standards-based ¨ Spatial DBMS: my. SQL, Postgre. SQL ¨ Open. GIS + Web: Map. Server, Degree ¨ n Low-high products Stable kernel, innovation at the periphery ¨ GRASS and R ¨ n What about GIScience challenges? ¨ spatio-temporal data models, geographical ontologies, spatial statistics and spatial econometrics, dynamic modelling and cellular automata, environmental modelling, neural networks for spatial data

Terra. Lib: Open source GIS library n Data management ¨ n Functions ¨ n

Terra. Lib: Open source GIS library n Data management ¨ n Functions ¨ n All of data (spatial + attributes) is in database Spatial statistics, Image Processing, Map Algebra Innovation Based on state-of-the-art techniques ¨ Same timing as similar commercial products ¨ n Web-based co-operative development ¨ http: //www. terralib. org

Operational Vision of Terra. Lib DBMS Terra. Lib Geographic Application Spatial Operations API for

Operational Vision of Terra. Lib DBMS Terra. Lib Geographic Application Spatial Operations API for Spatial Operations Access Oracle Spatial My. SQL Postgre SQL Terra. Lib Map. Objects + Arc. SDE + cell spaces + spatio-temporal models

Terra. Lib applications n Cadastral Mapping ¨ n Public Health ¨ n Indicators of

Terra. Lib applications n Cadastral Mapping ¨ n Public Health ¨ n Indicators of social exclusion in innercity areas Land-use change modelling ¨ n Spatial statistical tools for epidemiology and health services Social Exclusion ¨ n Improving urban management of large Brazilian cities Spatio-temporal models of deforestation in Amazonia Emergency action planning ¨ Oil refineries and pipelines (Petrobras)

Terra. Crime

Terra. Crime

Palm-top

Palm-top

Exemplos de Produtos Web

Exemplos de Produtos Web

Terra. Lib Structure Java Interface COM Interface OGIS Services C++ Interface Functions kernel Visualization

Terra. Lib Structure Java Interface COM Interface OGIS Services C++ Interface Functions kernel Visualization Controls Spatio-Temporal Data Structures File and DBMS Access I/O Drivers External Files DBMS

Spatio-Temporal Data Types

Spatio-Temporal Data Types

Events time Near in space, near in time? y x

Events time Near in space, near in time? y x

Dynamical Spatial Model f ( I (t) ) f ( I (t+1) ) F

Dynamical Spatial Model f ( I (t) ) f ( I (t+1) ) F f ( I (t+2) ) F f ( I (tn )) . . “A dynamical spatial model is a mathematical representation of a real-world process when a location changes in response to external forces (Burrough)

Spatial Simulation S 2 Reality - Bauru in 1988 S 3

Spatial Simulation S 2 Reality - Bauru in 1988 S 3

Cell Spaces: Old Wine, New Bottle

Cell Spaces: Old Wine, New Bottle

Regression with Spatial Data: Understanding Deforestation in Amazonia

Regression with Spatial Data: Understanding Deforestation in Amazonia

Future Deforestation Scenarios Terra do Meio South of Amazonas State Hot-spots map for new

Future Deforestation Scenarios Terra do Meio South of Amazonas State Hot-spots map for new deforestation

Modelling anisotropic space Spatial relations in Amazonia are not isotropic!

Modelling anisotropic space Spatial relations in Amazonia are not isotropic!

Desigining for Extensibility n Algorithms basic core of most successful GIS ¨ large number

Desigining for Extensibility n Algorithms basic core of most successful GIS ¨ large number of them do not depend on some particular implementation of a data structure ¨ based a few fundamental semantic properties of the structure ¨ properties can be - for example - the ability to get from one element of the data structure to the next, and to compare two elements of the data structure. ¨ n Spatial analysis algorithms ¨ can be abstracted away from a particular data structure and described only in terms of their properties.

Same Algorithm, Different Geometries

Same Algorithm, Different Geometries

Generic GIS Programming n How to decouple algorithms from data structures ? Idea: Iterators

Generic GIS Programming n How to decouple algorithms from data structures ? Idea: Iterators (“inteligent pointers”) ¨ Algoritms are not classes !! ¨ “Decide which algorithms you want; parametrize them so they work for a variety of suitable types and data structures” ¨ Algorithms Iterators Geometries

Scientific Challenges for Innovation in GIS n How can we design an algebra for

Scientific Challenges for Innovation in GIS n How can we design an algebra for ST types? ¨ n What are the spatial-temporal data types? How do we design a language for spatial modelling? Requires a caracterization of measurents ¨ Cognitively meaningful interfaces ¨ n Representation of Space ¨ n How do we represent anisotropic space? Extensibility of Models and Algorithms ¨ How do we design for extensibility?

Why am I here today in TU-Wien? n Innovation in GISystems ¨ n Requires

Why am I here today in TU-Wien? n Innovation in GISystems ¨ n Requires addressing challenges in GIScience Cooperation with prof. Andrew Frank Generic GIS Programming ¨ Semantics of Geographical Measurements ¨ Spatio-Temporal Types and Algebras ¨ Methods for Representation of Anisotropic Space ¨

Result of Sound Scientific Work High-Low Potential Rev Eng High-High my. SQL Open. Office

Result of Sound Scientific Work High-Low Potential Rev Eng High-High my. SQL Open. Office Linux Postgre. SQL perl Apache Postgres NCSA browser GRASS R Terra. Lib Low-Low Low-High Potential Distrib Develop

Conclusions n Open Source software model The Linux example is not applicable to all

Conclusions n Open Source software model The Linux example is not applicable to all situations ¨ Moving from the individual level to the organization level ¨ n Geoinformation Innovative open source GIS software has a large role ¨ Sound research is needed to support innovation ¨ n Cooperation in GIScience is fundamental The problem is enormous. . . requires a combination of R&D ¨ We are few R&D groups ¨ Cooperation is the only way to ensure a future for GIScience ¨