Integration Activities Barriers and Challenges Sheila O Denn
Integration Activities, Barriers and Challenges Sheila O. Denn School of Information and Library Science University of North Carolina at Chapel Hill Presentation to “Metadata and NICS: Joys, Sorrows, and Payoffs” November 2, 2005 The Gov. Stat Project Find what you need, understand what you find. Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward The Statistical Knowledge Network
Integration is the Name of the Game l In more and more contexts, there is a need for the ability to bring together data from disparate sources. l l Data Integration refers to those activities that occur on the back-end, such as aggregating selected data from distributed sources into a centralized repository. Information Integration refers to the cognitive processes undertaken by an individual user to synthesize disparate bits of information into understanding.
The Relationship of Metadata to Integration l l Standard metadata schemas can enhance data integration by allowing crawlers or other kinds of agents to pull materials from sources based on the values of metadata elements. Standard metadata schemas can also enhance information integration by allowing users to group data according to the values of metadata elements.
Hierarchy of Integration high level of integration/increasing amounts of metadata This line moves up and down depending on where the data producer defines the optimal balance of functionality vs. effort low level of integration/decreasing amounts of metadata • Linking of analysis units, universe statements, concept definitions, across documents and agencies • Linking of row and column headings to underlying survey variables • Linking of contextual information (such as footnotes) to tables, row/column headings, or data values • Linking of data values to row and column headings • Searchable table titles
Goals of the Gov. Stat Project l To create an integrated model of user access to and use of US government statistical information (The Statistical Knowledge Network) l To design and test prototype interface tools to support finding and using statistics l To support integration (technical and intellectual) of statistical data
Gov. Stat SKN Data Flow SKN Consortium Agencies …………. SKN Registry Actions Contribute Find Display Annotate Understand Manipulate Collaborate Objects Reports metadata Tables metadata People metadata Glossary Annotations Ontology Private Work Space Objects Actions Rules & Constraints Private Work Space …. . Objects Actions
What We Know About Metadata and Users of Statistical Information l Ongoing studies of users and metadata since 1998 (or so) Metadata usage to support variable choice (Hert and Bosley at BLS, 1997/8) Metadata requirements for understanding tables (Hert & Hernández, 1999). Metadata requirements in a variety of integration tasks (Denn, Haas, & Hert, 2003). Statistical comparisons particularly investigating the types of comparisons made and the rules experts employ during those comparison processes (Hert, 2004).
Users and Metadata l Some insights from the studies Some types needed: • Definitions • Survey methodology • Rationales and information on differences (what is the difference between concept 1 and concept 2) • Currency of information (what’s the latest data I can get, when will more data be available, etc. ) • Table structure • Interface design Supporting use requires significant amounts of metadata including some not easily generated (automatically or otherwise)
Users, Integration, and Metadata l Key Integration activities Noting discrepancies Manipulating information Comparing • • Across geographic units, units of time When definitional differences exist among concepts/variables of interest Across sources (websites vs. print) Collection approaches Index values over time Terminology Deflated vs. real dollars, corrected vs. uncorrected, preliminary vs. final release, seasonally adjusted vs. non-adjusted Data with different confidence intervals
Barriers to Successful Integration Lack of definitions or source information Lack of user knowledge of • Appropriate strategies for using stats • Nature of index values and their use • Nature of survey/census purpose and approach • Domain and social science statistics processes/approaches/definitions Interface design problems including interface inconsistency Inconsistent data across sources User inability to determine if statistics were available Terminology differences across sources
Challenges l l Need to achieve better understanding of tradeoffs between costs of metadata creation and user/agency benefits in various tasks Need to assess Hierarchy of Integration model for: l supporting agency decision making supporting user understanding Need to explore extent to which metadata usage is domain-specific or when and how can we generalize about metadata usage, metadata system design
For More Information Sheila O. Denn denns@ils. unc. edu http: //ils. unc. edu/~denns/ http: //ils. unc. edu/govstat/ The Gov. Stat Project Find what you need, understand what you find. Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward The Statistical Knowledge Network
- Slides: 12