PROCESS Landing Page Strawman PROCESS Data Management Data

  • Slides: 5
Download presentation
PROCESS Landing Page Strawman PROCESS Data Management: Data Processing covers any data manipulation activity

PROCESS Landing Page Strawman PROCESS Data Management: Data Processing covers any data manipulation activity resulting in the alteration or integration of source data, including the preparation of data for preservation and sharing. Process components can support retrieval, filtering, screening, transformation, translation, classification, transfer, and integration, among others. Data Processing typically produces data ready for use, but can also result in graphs and reports. A Process Can Exist Anywhere Within the Data Lifecycle The Process “stage” of the data lifecycle is not limited to data preparation activities after Acquisition and before Analysis, but includes all data handling activities from obtaining data and initial storage, through basic data screening and preparation, iterating with data changes prompted during analysis, and culminating with actions that prepare data for long-term preservation and sharing. Processes may also be created for producing documentation, managing data quality, and data protection Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some of them. The Importance of Standards to Data Processing The use of data standards facilitates the creation of automated data processing procedures and scripts. For example, the use of common data models provides a structural consistency for creating and sharing reusable process components and tools to serve maintenance and analytical needs for multiple projects using the same kind of data. ETL – Extract, Transform, and Load ETL is a term representing the overall process of moving data from one form or environment to another. ETL integrates and chains together processes that (1) gather data from a source, (2) screen and transform it, and (3) load it into a target data store. ETL processes are usually automated to support data warehouses, online portals, and integrated data environments such as The National Map. Process Documentation, Diagrams, and Workflow Tools Capturing and communicating information about how data were processed is critical for reproducible science. In addition to descriptive metadata, the use of flow charts, data flow diagrams, and workflow tools can help.

PROCESS Landing Page Strawman Cont’d PROCESS Data Management: Data Processing Process Automation and Scripting

PROCESS Landing Page Strawman Cont’d PROCESS Data Management: Data Processing Process Automation and Scripting Data processing can range from a manual set of actions performed by a single person to meet specific research needs, to a fully automated operation using scripts or programs to ensure repeated production of high-quality datasets in a consistent and documented way. Automating even simple processes helps to provide documented consistency and repeatability, and generate necessary documentation. [R projects] ETL – Extract, Transform, and Load ETL is a term used to represent a very common chain of integrated process activities. Extraction of data from one or more sources is followed by screening and transformation of the data into a form that is then loaded into a target data store. ETL processes are frequently automated and used to keep data current in online Portals, data warehouses, and integrated data environments such as The National Map. Process Component Library Current best-practices for coding promote the creation of reusable modular components to manipulate datasets and other objects in a consistent and documented way. USGS shares components via Git. Hub and other venues. Examples of Data Processing at USGS produces extensive datasets and interpretive products using a variety of data processing techniques and methods. This section provides examples of data processing for satellite imagery, sensor networks (earthquakes, real-time stream data), telemetry from ocean-going vessels and wandering animals, and for the production of aggregate datasets in portals and data warehouse access points.

PROCESS Landing Page Strawman Cont’d PROCESS Data Management: Data Processing What the U. S.

PROCESS Landing Page Strawman Cont’d PROCESS Data Management: Data Processing What the U. S. Geological Survey Manual Says: Policies that apply to the Process stage largely deal with providing appropriate documentation of the methods and actions used to modify data from its raw form to the form used for research or produced for sharing. Metadata standards (FGDC, ISO) include sections for describing the ‘provenance’ of data, meaning that enough information is provided for the user to determine where data originated and what changes were made to get to the form being described. The USGS Manual Chapter 502. 2 - Fundamental Science Practices: Planning and Conducting Data Collection and Research discusses the requirements for data documentation: "Documentation: Data collected for publication in databases or information products, regardless of the manner in which they are published (such as USGS reports, journal articles, and Web pages), must be documented to describe the methods or techniques used to collect, process, and analyze data (including computer modeling software and tools produced by USGS); the structure of the output; description of accuracy and precision; standards for metadata; and methods of quality assurance. " Further: "Standard USGS methods are employed for distinct research activities that are conducted on a frequent or ongoing basis and for types of data that are produced in large quantities. Methods must be documented to describe the processes used and the quality-assurance procedures applied. " The USGS Manual Chapter 502. 4 - Fundamental Science Practices: Review, Approval, and Release of Information Products covers the documentation of methodology: "Methods used to collect data and produce results must be defensible and adequately documented. " Software Release --- put a reference here that describes how scripts and software that perform data processing need to be fully documented, reviewed, and released.

PROCESS Landing Page Strawman Cont’d Data Management: Data Processing Recommended Reading: References: PROCESS

PROCESS Landing Page Strawman Cont’d Data Management: Data Processing Recommended Reading: References: PROCESS

1 st Sublevel Page PROCESS Sub Part Definition Key Points Bubble More Defs Best

1 st Sublevel Page PROCESS Sub Part Definition Key Points Bubble More Defs Best Practices Etc Special Call-out Etc Etc What the Survey Manual Says Recommended Reading References