Data Forge Source Forge for Datasets Preserving and

  • Slides: 4
Download presentation
Data. Forge: Source. Forge for Datasets Preserving and Sharing Experimental Data Prabal Dutta Jan

Data. Forge: Source. Forge for Datasets Preserving and Sharing Experimental Data Prabal Dutta Jan 13, 2005 1

CACM – September 2004, Vol. 47, No. 9 Jan 13, 2005 2

CACM – September 2004, Vol. 47, No. 9 Jan 13, 2005 2

Why bother preserving or sharing data? • Too often data “graduates” with a student

Why bother preserving or sharing data? • Too often data “graduates” with a student (job security? ) • Good experimental design and data collection – Is time-consuming • Days, weeks, months – Overhead-laden • 80% in some experiments – Requires infrastructure and physical access • Creates a barrier to entry • End-to-end data providence – Needed for traceability from source to interpretation – Includes experimental design, setup, collection, scrubbing, selection, fusion, analysis, and conclusions – Open science is good science Jan 13, 2005 3

Where do we go from here? • Key Questions: How to – – –

Where do we go from here? • Key Questions: How to – – – Establish data ownership, precedence, and providence? Motivate academic/industrial researchers to share data? Recognize and reward the contributions of experimentalists? Mitigate and manage conflicts of interests? Disseminate knowledge of datasets? Archive data in truly reusable form? • Potential Impacts – Dramatically improve research ROI through reuse – Lower barriers to entry for emerging groups – Capture and retain the products of scientific inquiry • The CENTS opportunity – Leadership role in driving adoption of “sensor data schema” – Distribute (BSD-style) open-source experimental datasets – Promote a “sensor data-abstract services” Jan 13, 2005 4