Metadata workshop 15 December 2003 Durham University Metadata
Metadata workshop, 15 December 2003 Durham University
Metadata Workshop Timetable project overview; metadata and geo-spatial datasets; HFE Metadata Application Profile and guidelines; metadata tools; benefits of creating metadata; a Go-Geo! Portal overview and ‘hands on’ evaluation session. 2
Aims of the Workshop introduce geo-spatial metadata concepts and available resources with the intention of establishing a new mindset amongst data developers and users in academia; encourage metadata creation and publication; seek your feedback on the design and functionality of the Go-Geo! portal. 3
Project Overview The driving impetus of this project was the recognition that a data sharing and management solution needed to be developed for the academic community to address the increasing amounts of geo-spatial datasets that academics and students were creating with the use of GI systems and database technologies and conventional means. Portal technology and metadata were identified as the resources for delivering these capabilities to the academic community, especially with regards to using portal technology as a mechanism to publicise and deliver existing datasets to a range of users. This led to the development of … 4
The Go-Geo! Portal a simple interface designed to run queries to discover geo-spatial datasets. The portal enables searching by the use of various options including free text, date, resource type and geographic location. Geo-spatial datasets refer to data. . . 5
Statistical Account of Scotland NUMBER XIII. PARISH OF CULLEN. (COUNTY OF BANFF, SYNOD OF ABERDEEN, PRESBYTERY OF FORDYCE. ) By the Rev. Mr. ROBERT GRANT. Royalty, Extent, Climate, etc. CULLEN, as appears from old charters, was originally called Inverculan, because it stands upon the bank of the Burn of Cullen, which, at the N. end of the town, falls into the sea: but now it is known by the name of Cullen only. Cullen is a royal burgh, formerly a constabulary, of which the Earl of Findlater was hereditary constable. The set, as it is called, of the council, consists of 19, in which number are included the Earl of Findlater, hereditary preses, 3 bailies, a treasurer, a dean-of-guild, and 13 counsellors. The parish extends from the sea fouthward, about 2 English miles in length. Geo-spatial dataset “data that have some form of spatial or geographic reference that enables them to be located in two- or three-dimensional space” 6
Project History Phase I - Scoping Study 10 -month project phase (Aug 2000 - June 2001), JISC funded; undertaken by EDINA and the History Data Service (now UKDA) and involved other key players e. g. JISC, MIMAS, ADS, UKDA; feasibility study understand requirements and demand for a portal and browser explore options & investigate technical and organisational issues activities included undertaking requirements analysis reviewing metadata standards v. needs of HFE identifying geo-spatial resources and assessing 7
Phase II - Portal Demonstrator running from July 2002 to June 2003, portal-related activities included logo name and design, hence, the Go-Geo! Portal; portal Help pages; the development of a demonstrator portal with simple query interfaces allowing for search by subject, date, resource type and geographic location; and further development to demonstrate crosssearching of a database local to the portal; an existing, remote, structured geo-spatial data directory service to find geo-spatial data [HDS database]; an existing resource catalogue containing georelated resources [GE: source]; 8
Portal GE: source Geo-data Gateway NGDF/GIgateway Network Other IE Content Providers Local go-geo database Metadata or resource servers Geo-data Network (proposed) Go-Geo! portal architecture 9
Phase II Go-Geo Content GI-related resources, tied together by location, which include; software, learning resources, courses and training, etc. information about studies and projects, articles, reports, organisations, personal contacts, mailinglists conferences guidance and reference documents for understanding and creating geo-spatial metadata a workshop was held at University of Essex in January 2003 to introduce the Go-Geo! Portal demonstrator to stakeholders as part of an effort to 10
Phase II – Metadata Activities amended and finalised the HFE Metadata Application Profile derived in large part from the NGDF Discovery Metadata Guidelines and mandatory ISO 19115 elements; produced a 150 page guideline document for the HFE Profile and to support metadata creation; cross-mapped between the HFE Profile, the ISO 19115 Metadata Standard, the FGDC Standard and the NGDF; created 100 metadata records for portal content and demonstrator purposes. This included converting 25 records from Archaeology Data Service (ADS), the History Data Service (HDS), and the Manchester Information & Associated Services (MIMAS) and 75 created records from EDINA; reviewed and selected potential sources for geo 11
Phase III Go-Geo Portal Trial Service and Metadata Initiative running from August 2003 to July 2004, Phase III project efforts will entail running portal evaluation sessions at selected metadata workshops and creating an on-site questionnaire. Both efforts are meant to encourage feedback that will lead to improvements in portal functionality and design in preparation for rolling the portal out as full service; a sister project, the JISC-funded Metadata Initiative, will involve the promotion of geo-spatial metadata through workshops and presentations. These are to be organised and presented at up to 18 universities across the UK. The workshops, such as this one, will provide an introduction to geo-spatial metadata, the HFE Metadata Application Profile, supporting 12 guidelines and metadata tool.
information that describes something, in this case, a geo-spatial dataset. The details include the What, Where, When, Who and Why of the dataset, plus the means to access and use it. A metadata record may answer the following questions about a dataset: What is the purpose of the dataset? Where did the dataset originate? What attribute information does it contain? What processes or algorithms were employed to create it? What spatial reference system does the dataset use? What is the granularity of the data? When was the dataset What geographic area or extent does it cover? Whom do I contact for more information or access to the dataset? What are the access and use restrictions and how much will it cost? Who is responsible for creating the metadata record for the dataset? What time period does the dataset content cover? 13
Metadata reveal information that isn’t apparent when looking at geo-spatial dataset files in a directory. The information details of a dataset file are revealed in the metadata record. 14
A geo-spatial dataset file opened in a GIS software package doesn’t always reveal detailed information without further investigation. What do these polygons represent? which application? what are the attributes? where is this study area? which projection and co-ordinate system? what is the spatial accuracy? when were the data captured and processed? 15
Think also of defining metadata in terms of food product labelling. Labels provide specific information about the ingredients in these tins. Remove the labels and decide which tin to open. One tin contains tuna-flavoured cat food and the other tuna fish. Would you select Tin ‘A’ or ‘B’? 16
Metadata Standards represent precise specifications applied to information documentation operations/procedures to enforce and ensure consistency and interoperability. Metadata Standards are organized in a hierarchy of compound elements or entities and data elements that define the information content for metadata to document a set of data. Metadata Standards also assigns structure and conditions to elements and entities. These include Element and Entity Definitions and Identifiers, Obligations, Data Type and Domain. Obligations refer to whether or not a value must be entered for the element; Data Type defines the value format entered, such as character string, date, numeric or a 17
Metadata Standard Initiatives Perhaps the most well-known metadata initiative is the Dublin Core. The Dublin Core element set defines 15 metadata elements for simple resource discovery. It also serves as an intermediary source for use between the numerous community-specific formats. 1) Title 2) Creator 3) Subject and Keywords 4) Description 5) Publisher 6) Contributor 7) Date 8) Resource Type 9)Format 10) Resource Identifier 11)Source 12) Language 13) 18
Initiatives Federal Geographic Data Committee’s Content Standard for Digital Geo-spatial Metadata (CSDGM) contains 334 elements. This standard was produced during a mid 1990 s initiative for the intended use of documenting geo-spatial datasets. The National Geo-spatial Data Framework (NGDF)/Gigateway Metadata Guidelines are based on the FGDC standard. The NGDF/Gigateway Guidelines represent an application profile created for the UK geo-spatial community and Gigateway web service. 19
Application Profiles The geo-spatial metadata standards contain too many elements and many organisations turn to the development of application profiles to meet their needs. a significant reduction in the number of entities and elements each organisation selects from the standards this allows for selecting specific elements that are best suited for specific applications. The NGDF/GIgateway Metadata Guidelines contain 42 entities and elements and were selected to meet the needs of the UK geospatial community. additional elements can also be added that aren’t part of a standard, though this reduces cross-searching capabilities across a wider network and other 20
The HFE Metadata Application Profile and Guidelines derived from the NGDF Metadata Guidelines and the ISO 19115 Metadata Standard, the HFE Metadata Application Profile was created to support the needs of the UK academic community; it contains 71 elements categorised and separated under the eight entity groups; has 27 mandatory elements of which 12 elements are used for contact details. With the exception of one element (Description), the remaining 15 elements require only short answers or the selection of appropriate term(s) from lists; Guidelines are embedded in the Go-Geo! Portal and contain 150 pages of support material and examples 21
The HFE Metadata Application Profile Eight Groups (Entities) G 1 Citation G 2 Identification Information (What) G 3 Data Capture Period (When) G 4 Time Period Covered by Dataset (When) G 5 Spatial Extent of Dataset (Where) G 6 Custodian (Who) G 7 Distributor (Access) G 8 Metadata Creator/Record Creator 22
……. . and subgroup entities G 2 Identification Information G 1. sg 2 Spatial Reference System G 1. sg 3 Level of Spatial Detail G 5 Spatial Extent of Dataset G 5. sg 1 Spatial Referencing using Geographic Co-ordinates G 5. sg 1 -a Spatial Referencing using Coordinates of a Bounding Rectangle G 5. sg 1 -b Spatial Referencing using coordinates of a Bounding Polygon G 5. sg 1 Spatial Referencing using Geographic Identifiers G 7 Distributor 23
G 1 Citation 1. Title (Mandatory) (1) The name by which the dataset is known. 2. Alternative Title (Optional) Short name, other name, acronym or alternative language title. 3. Creator (Mandatory) (2) Organisation or person that developed the dataset and has primary responsibility for the intellectual content of the dataset. 24
4. Identifier (Optional) A unique string or number used to identify the dataset. 5. Edition (Mandatory) (3) The number of the edition of the dataset. 25
G 2 Identification Information (What) 6. Topic (Mandatory) (4) Main theme(s) of the dataset. 1) Farming 2) Biota 3) Boundaries 4) Climatology/Meteorology/Atmosphere 5) Economy 6) Elevation 7) Environment 8) Geo-scientific Information 9) Health 10) Imagery/Base Maps/Earth Cover 11) Intelligence/Military 12) Inland Waters 13) Location 14) Oceans 15) Planning/Cadastre 16) Society 17) Structure 18) Transportation 19) Utilities/Communication 26
7. Controlled Vocabulary (Mandatory) (5) Name of the controlled vocabulary used as a source for the controlled keywords. -UNESCO Thesaurus (United Nations Educational, Scientific and Cultural Organization) -GEMET (GEneral Multilingual Environmental Thesaurus) -HASSET (Humanities and Social Science Electronic Thesaurus) 8. Controlled Keywords (Mandatory) (6) Keywords taken from a controlled vocabulary summarising the subject of the dataset. 9. Other Keywords (Optional) 27
10. Controlled Place Name Vocabulary (Optional) Name of the controlled vocabulary used as a source for the controlled place name keywords. -Getty Thesaurus of Geographic Names -Ordnance Survey 1: 50000 Gazetteer -geo. Xwalk 11. Controlled Place Name Keywords (Optional) The geographic name of a location(s) covered by a dataset. 28
12. Description (Mandatory) (7) A brief description of the dataset. This should include some explanation as to why the dataset was produced and how it has been used since its creation. 13. Quality (Optional) A general assessment of the quality of a dataset for determining its fitness for use. Quality is stated in terms of accuracy, completeness, and consistency for both the data and the dataset. 14. Language (Mandatory) (8) The language(s) used within the dataset. 29
15. Further Information (Optional) Source of further information about the dataset. 16. Related Datasets (Optional) Information about other, related datasets of a similar theme or derived from a common source, which may be of interest to the user. 30
G 2 Identification Information (What) G 2. sg 1 Spatial Reference System 17. Co-ordinate System (Conditional Mandatory) (9) Name or description of the spatial referencing system used within the dataset, which is based on co-ordinates e. g. British National Grid, Irish National Grid, latitude and longitude. 18. Geographic Identifiers (Conditional Mandatory) (10) Name or description of the spatial referencing system used within the dataset, which is based on geographic identifiers e. g. 31
G 2 Identification Information (What) G 2. sg 2 Level of Spatial Detail 19. Source Scale Denominator (Optional) Denominator of the representative fraction on the source map(s) (e. g. on a 1: 50000 scale map, the source scale denominator is 50000). If no source map used, enter 0. If multiple source map scales were used, enter the Source Scale Denominator of the smallest scale map (largest denominator). 32
20. Imagery or Grid Raster Cell or Pixel Size X-Value (Optional) The column width of a raster cell expressed in distance units of measure. 21. Imagery or Grid Raster Cell or Pixel Size Y-Value (Optional) The row height of a raster cell expressed in distance units of measure. 22. Smallest Administrative Unit (Optional) The smallest representative unit associated with disaggregated statistical data. 33
G 3 Data Capture Period (When) 23. Status of the Start Date for Dataset Capture (Optional) Declaration on the status of the starting date for data capture. Known - Not Applicable 24. Start Date of Dataset Capture Process (Optional) Date on which data for dataset were first collected. 20031215 34
25. Status of the Completion Date for Dataset Capture (Optional) Declaration on the status of the completion date for data capture. Known - Not Applicable - Ongoing 26. Completion Date of Dataset Capture Process (Optional) Date on which data for dataset were last collected. 20031215 35
27. Update Frequency (Optional) The frequency with which revisions and updates are made to the dataset after its initial completion. Hourly – Daily – Weekly – Fortnightly – Monthly – Quarterly Biannually – Annually – Biennially – Triennially – Quinquennially Decennially – Continuous – Irregular – Never – Not Known - Other 36
G 3 Time Period Covered by Dataset (When) 28. Start Date for Time Period Covered by Dataset (Optional) The start date of the actual time period the dataset covers. 29. End Date for Time Period Covered by Dataset (Optional) The end date of the actual time period the dataset covers. 37
G 5 Spatial Extent of Dataset (Where) G 5. sg 1 Spatial Referencing using Geographic Co-ordinates 30. System of Spatial Referencing by Co-ordinates (Mandatory) (11) Name of the spatial reference system used for the geographic coordinates. British National Grid – Irish Grid – Latitude and Longitude 38
G 5. sg 1 -a Spatial Referencing using Co-ordinates of a Bounding Rectangle 31. West Bounding Co-ordinate (Mandatory) (12) Westernmost co-ordinate of a bounding rectangle. (Grid Value/Longitude) 32. East Bounding Co-ordinate (Mandatory) (13) Easternmost co-ordinate of a bounding rectangle. (Grid Value/Longitude) 33. North Bounding Co-ordinate (Mandatory) (14) Northernmost co-ordinate of a bounding rectangle. (Grid Value/Latitude) 34. South Bounding Co-ordinate (Mandatory) (15) 39
Geographic Co-ordinates G 5. sg 1 -b Spatial Referencing using Co-ordinates of a Bounding Polygon 35. Spatial Referencing using Co-ordinates of the Bounding Polygon (Optional) The set of x and y co-ordinates (first number = easting of a point, second number = northing of a point) that make up the bounding polygon. 40
G 5. sg 2 Spatial Referencing by Geographic Identifiers 36. Nations (Optional) Geographic coverage expressed in terms of nations within the British Isles. England – Northern Ireland – Scotland – Wales – Isle of Man Channel Islands – United Kingdom – Republic of Ireland 41
37. Administrative Areas (Optional) Geographic coverage expressed in terms of administrative areas. 38. Postcode Districts (Optional) Geographic coverage expressed in terms of postcode districts. 42
G 6 Custodian (Who) 39. Name of Custodian (Mandatory) (16) The name of the organisation or person responsible for the maintenance of the dataset. 40. Postal Street Address of Custodian (Mandatory) (17) 41. Postal City of Custodian (Mandatory) (18) 42. Postal County of Custodian (Optional) 43. Postal Code of Custodian (Mandatory) (19) 44. Postal Country of Custodian (Mandatory) (20) 43
45. Telephone Number of Custodian (Optional) 46. Facsimile Number of Custodian (Optional) 47. Email Address of Custodian (Optional) 48. Web Address of Custodian (Optional) 44
G 7 Distributor (Access) 49. Name of Distributor (Mandatory) (21) The name of the organisation or person from whom the dataset may be obtained. 50. Full Postal Street Address of Distributor (Mandatory) (22) 51. Postal Code of Distributor (Mandatory) (23) 52. Telephone Number of Distributor (Optional) 53. Facsimile Number of Distributor (Optional) 54. Email Address of Distributor (Optional) 45
56. Presentation Type (Optional) Form in which the dataset is available. Image – Graphic – Map – Numeric – Text - Other 57. Dataset Format (Optional) Format in which digital data can be provided (e. g. DXF, DLG, Map. Info, IDRISI, ARC/INFO, ERDAS, DBF) 58. Supply Media (Optional) Media format in which the dataset can be supplied. Paper – Magnetic – Optical – Online - Other 46
59. Sample (Optional) A sample of the dataset and its approximate file size (Megabytes). 60. Online Linkage (Optional) The name of the World Wide Web site or other on-line source that contains the dataset. 47
G 7 Distributor (Access) G 7. sg 1 Access and Use Constraints 61. Access Constraints (Optional) Restrictions and legal prerequisites for accessing the dataset. These include any access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the dataset. Financial – Legal – Other – Not Known - None 48
62. Access Details (Optional) Description of the restrictions and legal prerequisites for accessing the dataset. 63. Use Constraints (Optional) Restrictions and legal prerequisites on using the dataset after access is granted. These include any access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the dataset. 49
G 8 Metadata Creator/Record Creator 64. Name of Metadata Creator (Mandatory) (24) The name of the organisation or person responsible for the metadata updates. 65. Full Postal Street Address of Metadata Creator (Mandatory) (25) 66. Postal Code of Metadata Creator (Mandatory) (26) 67. Telephone Number of Metadata Creator (Optional) 68. Facsimile Number of Metadata Creator (Optional) 50
69. Email Address of Metadata Creator (Optional) 70. Web Address of Metadata Creator (Optional) 71. Metadata Last Updated (Mandatory) (27) Date on which the metadata (file) were created or last updated. 20031215 51
Metadata Tool temporary metadata tool designed within MS Access, which can be downloaded at the project web pages and used for creating metadata records; includes all 71 elements of the HFE Metadata Application Profile; metadata records can be saved as database files and sent to the UK Data Archive where they’ll be validated and sent to EDINA for conversion into an XML format and stored on the Go-Geo! Portal’s node. XML is an EXtensible Markup Language, which is a system for marking up documents and data using tags that indicate or define structural elements; the UKDA and EDINA are developing a JAVA-based internet metadata tool that will further simplify the process of metadata creation and validation; finally, some GIS software packages contain 52
in our brains and we need to move it from here to the computer Some day we might find solutions to this problem……. . ………. . 53
we may encounter beings from other worlds who could use telepathy to extract dataset information from our heads…. . 54
or we’ll discover a technological solution that extracts dataset information from our heads and transfers it to the computer. Until then, we’ll need to depend on available tools. 55
MS Access Metadata Tool Demonstration 56
Creation provides support to create a mindset and operational structure for managing and storing dataset information for departmental and intra-departmental use; assures integrity of existing and new datasets using metadata as a tracking mechanism to monitor changes and edits to datasets; maintains an inventory of datasets to reduce redundancy and time required to reassess existing datasets for new and future applications; eliminates or reduces the risk of redundancy in data collection or deletion of existing datasets; reduces effects of staff turnover and minimise its disruptive effects; protects investments of time and cost dedicated to 57
assures that other organisations will not replicate data at added cost and time; provides potential users a dataset catalogue to view and select datasets to complement or augment their existing in-house datasets, which can be used together for other applications; allows for more spontaneity amongst users as they browse the Go-Geo! portal and metadata. The discovery of a dataset may instigate the user to develop an idea for a new application; metadata on the portal can be referenced and cited for project proposals; the portal’s node can serve as a repository for organisations to store and manage their metadata and use the portal as an internal resource to access and share datasets. This will save an organisation 58
metadata and the portal can provide a quick, shortterm solution for data developers to protect their intellectual rights using metadata to announce their data and applications on the portal; some organisations and individuals may wish to advertise and sell their datasets to other interested parties in academia and in the private and public sectors; the portal will be linked to the other portals and the UK gateway, thus allowing for advertisement to a large audience of data users; the metadata and portal will complement and augment other UK academic portals and the UK’s GIgateway site. 59
GIgateway Just a few words regarding Gigateway (www. gigateway. org. uk). It is a geo-data gateway site serving the UK’s geo-spatial community. The Go-Geo! Portal will be a service that complements the Gigateway site. The Go-Geo! portal will focus specifically on the needs of the academic community. Students moving on to employment in 60
Go-Geo! Portal Trial Service the trial of the Go-Geo! portal demonstrator service began on 17 th November 2003. an initial evaluation is being held to allow for further feedback at the start of the trial. 2 nd evaluation will take place towards the end of the trial. The trial Go-Geo! service and further information about the project can be found at: http: //www. gogeo. ac. uk. 61
Go-Geo! Portal Evaluation The evaluation session will last for 45 minutes. try out the portal at http: //www. gogeo. ac. uk. complete and return the questionnaire 62
63
- Slides: 63