Quick Lesson on Databases Relational databases are key
Quick Lesson on Databases • Relational databases are key to managing complex data • You’ve been using relational databases with “Joins” and “Relates” in Arc. GIS • Geo. Databases are relational databases • Structured Query Language (SQL) is the primary language for relational databases • You’ve been using SQL statements in Arc. GIS to query data
Relational Databases • Need to represent data with a complex structure Plot Species Tree
Database Tables • What you’ve seen in Arc. GIS only more flexible • Tables are made up of “fields” (columns) and “records” (rows) • Queries are used to combine and subset tables into new tables • Each table should have a unique, integer, ID, referred to as a primary key – Greatly improves query performance
Field Data Types • Numeric – Float or integer – Auto numbered, use for primary keys • Dates – YYYY-MM-DD HH: MM: SS. SS – 2013 -04 -05 14: 23: 12. 34 • Text – Specified width – “Variant” width • Binary Large Objects (BLOB)
What’s Wrong With This? Tree Query LAT LON MEASYEAR MEASMON MEASDAY COMMON_NAME HT 45. 446392 -122. 236107 1995 6 22 Douglas-fir 49 45. 446392 -122. 236107 1995 6 22 Douglas-fir 27 45. 446392 -122. 236107 1995 6 22 Douglas-fir 95 45. 446392 -122. 236107 1995 6 22 Douglas-fir 66 45. 446392 -122. 236107 1995 6 22 Douglas-fir 118 45. 446392 -122. 236107 1995 6 22 Douglas-fir 76 45. 446392 -122. 236107 1995 6 22 Douglas-fir 147 45. 456116 -122. 397774 1995 6 22 Douglas-fir 185 45. 456116 -122. 397774 1995 6 22 Douglas-fir 105 45. 456116 -122. 397774 1995 6 22 Douglas-fir 89 45. 193054 -122. 51667 1996 6 23 Douglas-fir 90 45. 193054 -122. 51667 1996 6 23 Douglas-fir 95 45. 193054 -122. 51667 1996 6 23 Douglas-fir 96 45. 193054 -122. 51667 1996 6 23 Douglas-fir 99
Relational Databases • Allow us to “relate” tables to: – Reduce the overall amount of data • Removes duplicates – Makes updates much easier – Improves search speeds
Entity-Relationship Diagram • ERD – Unified Markup Language (UML) Relationship Types One to one One to many Many to many Entities Plot Relationships Species Tree
Plot ID Lat Lon Year Month Day 1 45. 446392 -122. 236107 1995 6 22 2 45. 193054 -122. 51667 1995 6 22 Tree Species ID Common Name ID Plot. ID Species. ID Height 1 Douglas-fir 1 1 1 49 2 Ponderosa Pine 2 1 1 27 3 1 1 95 4 1 1 66 5 1 1 118 … 1 … … 12 2 1 90 13 2 1 95 Primary Key Foreign Key
Database Normalization 1. Eliminate duplicate columns from the same table 2. Move fields that have “duplicate” row entries and move them to a related table 3. All field entries should be dependent on the primary key 4. There should be only one primary key in each table
Database Dictionary • Defines each of the tables and fields in a database • A database forms the basis for data management behind many GIS projects, web sites, and organizations • Proper documentation is key to long term success! – Database design (including ERDs) – Database Dictionary
Geospatial Databases • Not required to store spatial data! • Provide: – Field types for spatial data: point, polyline, polygon, etc. – Spatial operations: union, intersect, etc. – Spatial queries: return records that overlap with a polygon, etc. – Some provide spatial reference control
What we really want • What we need from a database: – Distributed, concurrent access (concurrency) – Automatic Backup – Version control – Unlimited amounts of data – Quick data access – Inexpensive – Broad OS Support – File-level copying – Geo. Spatial queries, operations, data types
Relational Databases • Enterprise-Level – SQL Server – Postgre. SQL – My. SQL – Oracle – Sybase • File-Level – Geodatabase – MS-Access
What we have SQL Server Postgre. SQL ESRI Geodatabase MS-Access Concurrency Yes No No Automatic backup Yes No No Versioning No No Data Size 100 s of millions 100, 000? Performance Fast Good Poor Cost $600 per CPU Free ~$10, 000 w/Arc. GIS ~$400 OS Windows Any Windows File-level copy No No Yes Spatial Queries Yes Yes No Spatial data types Yes Yes No Spatial operations Yes Yes No
Structured Query Language (SQL) • Comes from the database industry • “INSERT”, “DELETE”, and “SELECT” rows in tables • Very rich syntax • Portions of “SELECT” grammar used heavily in Arc. GIS: – Selecting attributes – Raster calculator – Geodatabases
Transaction SQL • “SQL” is a subset of T-SQL • T-SQL allows full management of a database: – Create & drop: • Tables, fields/columns, relationships, indexes, views, etc. – Administrative functions • Varies some between databases
Using SQL • All Databases have “query editors” that allow us to write, save, edit, and use SQL queries • Use programming languages to “write” queries and “fetch” records from the database
SQL: SELECT Field 1, Field 2 FROM Table. Name INNER JOIN Table. Name 2 ON Table. Name 2. FK=Table. Name. PK WHERE Filter 1 AND Filter 2 GROUP BY Field 1, Field 2 ORDER BY Field 1 [DESC], Field 2 [DESC] FK=Foreign Key, PK=Primary Key
Selecting Fields • SELECT * – Returns all fields as new table • SELECT Field 1, Field 2 • SELECT Table 1. Field 1, Table 2. Field 1 – Return specified fields • SELECT Table 1. Field 1 AS New. Name – Avoids name collisions
Selecting Tables • FROM Table 1 – Returns contents of one table • FROM Table 1 INNER JOIN Table 2 ON Table 2. Foreign. Key=Table 1. Primary. Key – Returns records from Table 2 that match primary keys in Table 1 – Does not return all rows in Table 1
Selecting Tables (con’t) • FROM Table 1 OUTER JOIN Table 2 ON Table 2. Foreign. Key=Table 1. Primary. Key – Returns all matches between Table 1 and Table 2 and any records in Table 1 that don’t match records in Table 2 – Missing values are NULL
Filters or “WHERE” clauses SELECT * FROM Table 1 WHERE (Field 1 Operator Value 1) Boolean. Operator (Field 1 Operator Field 2)
Filter Examples • WHERE: – ID = 1 – Area < 10000 – Area <= 10000 – Name = “Crater Lake” (case dependent) – Name LIKE “Crater Lake” (ignores case) • Notice: – String values have double quotes – Syntax for strings vary some between databases
SQL Comparisons • • Equals: = Greater than: > Less than: < Greater than or equal: >= Less than or equal: <= Not equal: <> Like: case independent string comparison with wild cards (%)
Boolean Operators A B A AND B A OR B NOT A NOT B T T F F T F T F T T F F F T T
More Complex Filter Examples • WHERE: – Name LIKE “Hawaii” AND Area < 10000 – Species LIKE “Ponderosa” AND DBH > 1
ORDER BY SELECT * FROM Table 1 ORDER BY Last. Name DESC, First. Name DESC • Careful with performance on large datasets and string fields
GROUP BY • Aggregates data SELECT Species , AVG(Height) FROM Trees GROUP BY Species • Only aggregated fields can appear in SELECT list
SQL INSERT • INSERT INTO Table. Name (Field 1, Field 2) VALUES (Value 1, ”Value 2”) • String values must be in quotes – Other values can also be in quotes • If the table has an “auto numbered” ID field, it will be added automatically • Otherwise, very difficult to set the ID field
SQL DELETE FROM Table. Name WHERE ID=Value - Deletes one row DELETE FROM Plot WHERE Plot. ID=12 - Deletes all rows with Plot. ID=12 DELETE FROM Table. Name - Deletes everything in Table. Name!
Database Performance h c ar e fa e D S t l u rch ea Indexed S Primary Key Search
Indexes • Added to a table – Typically for one field • Adds overhead to INSERT and DELETEs • Important for: – Large tables – Complex queries – Especially text searches!
Maintaining Performance • Always use integer, auto numbered primary keys • Avoid iterative or hierarchical queries • Sometimes code is faster: – Do simple query, load into RAM and sort • With REALLY big data, don’t use SQL – No. SQL, accessing data directly, without the use of a relational database package – There are “No. SQL” products in the works • Avoid text searches and sorts
Rasters and Databases • Don’t put rasters into a database! – Makes it impossible to backup and restore the database – Put a file path to the rasters in the database
- Slides: 34