Developing open source GIS what are the challenges













































- Slides: 45
Developing open source GIS: what are the challenges? Gilberto Câmara INPE – Brasil www. terralib. org Institute for Geoinformation – TU Wien – 16 June 2004
The Promise of Open Source n n When an OSS project reaches a “critical size” we obtain many benefits Robustness ¨ n Cooperation ¨ n ``Given enough eyeballs, all bugs are shallow. '' ``Somebody finds the problem and somebody else understands it'‘ (Linus Thorvalds) Continuous Improvement ¨ “Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging”
Naïve view of open source projects n Software Product of an individual or small group (peer-pressure) ¨ Based on a “kernel” with “plausible promise” ¨ n Development network ¨ n Open source products ¨ n Large number of developers, single repository View as complex, innovative systems (Linux) Incentives to participate Operate at an individual level (“self-esteem”) ¨ Wild-west libertarian (“John Waynes of the modern era”) ¨
Idealized model of OS software Networks of committed individuals
The Reality of Open Source n Previous existence of conceptual designs of similar products (the potential for reverse engineering) ¨ n Design is the hardest part of software (Fred Brooks) Problem granularity (the potential for distributed development) ¨ Effective peer-production requires high granularity
Potential for Reverse Engineering n Post-mature A private company develops a software product. ¨ Product becomes popular and it becomes part of the “public commons”. ¨ Others develop a public domain equivalent (e. g. , Open Office) ¨ n Standards-led ¨ ¨ ¨ Standards consolidate a technology Allow compatible solutions to compete in the marketplace. SQL database standard (e. g. , my. SQL and Postgre. SQL). POSIX standard (guidance to Linux) Open. GIS specifications (e. g. , Degree, Map. Server, Geo. Server)
Potential for Distributed Development n Parts of a software product ¨ n kernel and additional functions that use it (its periphery). Operating systems (Linux) well-defined kernel for process control ¨ periphery consisting of programs such as device drivers, applications, compilers and network tools. ¨ n Database management systems strong kernel of highly integrated functions (such as the parser, scheduler, and optimizer) ¨ much smaller periphery. ¨
Potential for Distributed Development n Each type of software product - periphery/kernel ratio ¨ n Kernel ¨ n a tightly-organized and highly-skilled programming team. Periphery ¨ n constrains the potential for distributed development More widespread programmers of various skills Example ¨ Out of more than 400 developers, the top 15 programmers of the Apache web server contribute 88% of added lines [Mockus, 2002 #2293].
Four Types of Open Source Software n High reverse engineering, high distribution potential n High reverse engineering, low distribution potential n Low reverse engineering, high distribution potential n Low reverse engineering, low distribution potential
Type 1 – High-High n High reverse engineering, high distribution potential: n Archetypical open source projects ¨ n The “Linux” model. Developers May have a separate job ¨ Time allocated in agreement with their employer. ¨ n community-led projects.
Type 2 – High-Low n High reverse engineering, low distribution potential n Large number of projects ¨ n Large presence of private companies ¨ ¨ ¨ n products similar to market leaders. reduced risk in reverse engineering. main design decisions take place within the institution Examples ¨ ¨ n Databases, office automation tools, web services. my. SQL and Postgre. SQL DBMS, GNOME from Ximian corporation-led projects.
Type 3 – Low/High n Low reverse engineering, high distribution potential n Stable kernel, innovative periphery usually there is no commercial counterpart ¨ share a relatively simple software kernel ¨ n Origin ¨ n Examples ¨ n academic environments GRASS GIS software and the R suite of statistical tools. collaborative projects
Type 4 – Low/Low n Low reverse engineering, low distribution potential n Innovative kernel, small periphery n Small teams under a public R&D contract addressing specific requirements ¨ aiming to demonstrate novel scientific work. ¨ n High mortality rate ¨ n most of them are restricted to the lifetime of a research grant. innovative products.
High-Low Potential Rev Eng High-High my. SQL Open. Office Linux Postgre. SQL perl Apache Postgres NCSA browser Low-Low GRASS R Low-High Potential Distrib Develop
High-Low Potential Rev Eng High-High corporate communitary innovative collaborative Low-Low Low-High Challenges? Potential Distrib Develop
Lessons from Open Source Projects n “It's fairly clear that one cannot code from the ground up in bazaar style. One can test, debug and improve in bazaar style, but it would be very hard to originate a project in bazaar mode. Linus didn't try it. Your nascent developer community needs to have something runnable and testable to play with” (Eric Raymond)
Moving from the Low-Low Quadrant n Software in the “Low-Low” quadrant ¨ n Unsustainable in the long run Moving from an innovative to a collaborative project Sharing innovation ¨ Transforming a crude prototype into a modular, well designed system ¨ n How do you build innovation into a modular design?
Moving from the Low-Low Quadrant n “Perfection in design is achieved not when there is nothing more to add, but rather when there is nothing more to take away”. (Saint-Exupery) n How do you achive perfection in information science? Good scientific foundation ¨ Usually, sound mathematical abstractions ¨ n What is the situation in GIS?
Do we have a solid foundation for GIS? id name year selection projection cartesian prod union difference SELECT name FROM faculty WHERE year > 1960 relations relational algebra SQL query language Operations on ST types Spatio-temporal data types Spatial algebra ? GIS language
Challenges for geoinformation Source: Gassem Asrar (NASA)
The Road Ahead: Smart Sensors SMART DUST Autonomous sensing and communication in a cubic millimeter Source: Univ Berkeley, Smart. Dust project
Knowledge gap for spatial data source: John Mc. Donald (MDA)
What’s the Current Status of Open Source GIS? n High-Low products Standards-based ¨ Spatial DBMS: my. SQL, Postgre. SQL ¨ Open. GIS + Web: Map. Server, Degree ¨ n Low-high products Stable kernel, innovation at the periphery ¨ GRASS and R ¨ n What about GIScience challenges? ¨ spatio-temporal data models, geographical ontologies, spatial statistics and spatial econometrics, dynamic modelling and cellular automata, environmental modelling, neural networks for spatial data
Terra. Lib: Open source GIS library n Data management ¨ n Functions ¨ n All of data (spatial + attributes) is in database Spatial statistics, Image Processing, Map Algebra Innovation Based on state-of-the-art techniques ¨ Same timing as similar commercial products ¨ n Web-based co-operative development ¨ http: //www. terralib. org
Operational Vision of Terra. Lib DBMS Terra. Lib Geographic Application Spatial Operations API for Spatial Operations Access Oracle Spatial My. SQL Postgre SQL Terra. Lib Map. Objects + Arc. SDE + cell spaces + spatio-temporal models
Terra. Lib applications n Cadastral Mapping ¨ n Public Health ¨ n Indicators of social exclusion in innercity areas Land-use change modelling ¨ n Spatial statistical tools for epidemiology and health services Social Exclusion ¨ n Improving urban management of large Brazilian cities Spatio-temporal models of deforestation in Amazonia Emergency action planning ¨ Oil refineries and pipelines (Petrobras)
Terra. Crime
Palm-top
Exemplos de Produtos Web
Terra. Lib Structure Java Interface COM Interface OGIS Services C++ Interface Functions kernel Visualization Controls Spatio-Temporal Data Structures File and DBMS Access I/O Drivers External Files DBMS
Spatio-Temporal Data Types
Events time Near in space, near in time? y x
Dynamical Spatial Model f ( I (t) ) f ( I (t+1) ) F f ( I (t+2) ) F f ( I (tn )) . . “A dynamical spatial model is a mathematical representation of a real-world process when a location changes in response to external forces (Burrough)
Spatial Simulation S 2 Reality - Bauru in 1988 S 3
Cell Spaces: Old Wine, New Bottle
Regression with Spatial Data: Understanding Deforestation in Amazonia
Future Deforestation Scenarios Terra do Meio South of Amazonas State Hot-spots map for new deforestation
Modelling anisotropic space Spatial relations in Amazonia are not isotropic!
Desigining for Extensibility n Algorithms basic core of most successful GIS ¨ large number of them do not depend on some particular implementation of a data structure ¨ based a few fundamental semantic properties of the structure ¨ properties can be - for example - the ability to get from one element of the data structure to the next, and to compare two elements of the data structure. ¨ n Spatial analysis algorithms ¨ can be abstracted away from a particular data structure and described only in terms of their properties.
Same Algorithm, Different Geometries
Generic GIS Programming n How to decouple algorithms from data structures ? Idea: Iterators (“inteligent pointers”) ¨ Algoritms are not classes !! ¨ “Decide which algorithms you want; parametrize them so they work for a variety of suitable types and data structures” ¨ Algorithms Iterators Geometries
Scientific Challenges for Innovation in GIS n How can we design an algebra for ST types? ¨ n What are the spatial-temporal data types? How do we design a language for spatial modelling? Requires a caracterization of measurents ¨ Cognitively meaningful interfaces ¨ n Representation of Space ¨ n How do we represent anisotropic space? Extensibility of Models and Algorithms ¨ How do we design for extensibility?
Why am I here today in TU-Wien? n Innovation in GISystems ¨ n Requires addressing challenges in GIScience Cooperation with prof. Andrew Frank Generic GIS Programming ¨ Semantics of Geographical Measurements ¨ Spatio-Temporal Types and Algebras ¨ Methods for Representation of Anisotropic Space ¨
Result of Sound Scientific Work High-Low Potential Rev Eng High-High my. SQL Open. Office Linux Postgre. SQL perl Apache Postgres NCSA browser GRASS R Terra. Lib Low-Low Low-High Potential Distrib Develop
Conclusions n Open Source software model The Linux example is not applicable to all situations ¨ Moving from the individual level to the organization level ¨ n Geoinformation Innovative open source GIS software has a large role ¨ Sound research is needed to support innovation ¨ n Cooperation in GIScience is fundamental The problem is enormous. . . requires a combination of R&D ¨ We are few R&D groups ¨ Cooperation is the only way to ensure a future for GIScience ¨