Data Quality Data quality a measure of how

  • Slides: 29
Download presentation
Data Quality • Data quality – a measure of how well the GIS data

Data Quality • Data quality – a measure of how well the GIS data represents the target domain • Related terms: – Data uncertainty – Data error – Data accuracy

Data Quality • Micro level components – factors that pertain to the individual data

Data Quality • Micro level components – factors that pertain to the individual data elements • Macro level components – factors that pertain to the data set as a whole

Data Quality • Micro level components – positional accuracy – attribute accuracy – logical

Data Quality • Micro level components – positional accuracy – attribute accuracy – logical consistency – resolution

Data Quality • Positional (spatial) accuracy – difference between location of an object as

Data Quality • Positional (spatial) accuracy – difference between location of an object as it is described in the data and its actual location – bias: systematic error – precision: standard deviation of error

Data Quality High bias: a systematic error Low bias: a random error ‘truth’ data

Data Quality High bias: a systematic error Low bias: a random error ‘truth’ data

Data Quality High precision: all errors about the same distance ‘truth’ data Low precision:

Data Quality High precision: all errors about the same distance ‘truth’ data Low precision: errors vary greatly in distance

Data Quality • Attribute accuracy – are geographic objects identified correctly – invalid attribute

Data Quality • Attribute accuracy – are geographic objects identified correctly – invalid attribute values – missing attribute values – ‘mixed-up’ attribute values

Data Quality • Logical consistency – logically consistent relationships between geographic objects – e.

Data Quality • Logical consistency – logically consistent relationships between geographic objects – e. g. if a lake edge forms the boundary of a state, the lake boundary line should be identical to the state boundary line

Data Quality Logical inconsistency among different data, due to generalization Water Not PA Land

Data Quality Logical inconsistency among different data, due to generalization Water Not PA Land PA

Data Quality • Resolution – raster: length of a side of a grid cell

Data Quality • Resolution – raster: length of a side of a grid cell in real world units – vector: size of the smallest geographic object represented (minimum mapping unit)

Data Quality • Macro level components – completeness – time – lineage

Data Quality • Macro level components – completeness – time – lineage

Data Quality • Completeness – coverage • proportion of data available for the area

Data Quality • Completeness – coverage • proportion of data available for the area of interest – classification • how well the classification is able to represent the data – verification • amount and distribution of field measurements or other independent sources of information that were used to develop the data

Data Quality • Completeness – coverage • proportion of data available for the area

Data Quality • Completeness – coverage • proportion of data available for the area of interest Area of data availability

Data Quality • Completeness – classification • how well the classification is able to

Data Quality • Completeness – classification • how well the classification is able to represent the data Agriculture Grains Orchards Forest Urban Water Deciduous Coniferous

Data Quality • Time – commonly, the date of the source material used to

Data Quality • Time – commonly, the date of the source material used to create the data – some data do not change significantly over the time data usage (elevation data) – other data can change rapidly (demographic data and land use)

Data Quality • Lineage – the history of a data set: the source data

Data Quality • Lineage – the history of a data set: the source data and processing steps used to produce the data – each data source and processing step introduces a level of error into the final data product – lineage should be encoded in documentation detailing how the data was produced and who did it

Data Quality • Sources of error – Error in spatial data cannot be completely

Data Quality • Sources of error – Error in spatial data cannot be completely eliminated, but it can be managed – trade-off between cost of creating and maintaining data and level of error

Data Quality • Sources of error – data collection – data input – data

Data Quality • Sources of error – data collection – data input – data storage – data manipulation – data output – use of results

Data Quality • Sources of error – data collection • error in field data

Data Quality • Sources of error – data collection • error in field data collection • errors in existing maps used for digital data creation

Data Quality • Sources of error – data input • inaccuracies in digitizing (operator

Data Quality • Sources of error – data input • inaccuracies in digitizing (operator and equipment) • discretization of geographic entities (e. g. vector digitizing of forest ‘edge’ • error in attribute entry

Data Quality • Sources of error – data storage • numerical precision

Data Quality • Sources of error – data storage • numerical precision

Data Quality • Sources of error – data manipulation • error propagation in multiple

Data Quality • Sources of error – data manipulation • error propagation in multiple overlay operations • ‘sliver’ polygons

Data Quality Data manipulation: sliver polygons Water Not PA Land PA

Data Quality Data manipulation: sliver polygons Water Not PA Land PA

Data Quality Vector to Raster Conversion

Data Quality Vector to Raster Conversion

Data Quality Raster to Vector Conversion Original

Data Quality Raster to Vector Conversion Original

Data Quality • Sources of error – data output • scaling inaccuracies (printer dpi)

Data Quality • Sources of error – data output • scaling inaccuracies (printer dpi) • instability of the medium

Data Quality • Sources of error – use of results • misinterpretation of data

Data Quality • Sources of error – use of results • misinterpretation of data • no acknowledgement of data uncertainty

Data Quality • Metadata – data about data – data quality is described in

Data Quality • Metadata – data about data – data quality is described in the metadata – standards for metadata and data sharing developed by National Committee for Cartographic Data Standards (NCDC) and, currently, Federal Geographic Data Committee (FGDC) – www. fgdc. gov – PASDA example

Data Quality • Metadata – digital spatial data that is derived from USGS paper

Data Quality • Metadata – digital spatial data that is derived from USGS paper maps conform to National Map Accuracy Standards (NMAP) http: //rockyweb. cr. usgs. gov/nmpstds/nmas. html – in place since 1940 s – as set of accuracy ‘requirements’ that all published maps conform to