MANAGING RESEARCH DATA AT MIT GROWING THE CURATION
- Slides: 59
MANAGING RESEARCH DATA AT MIT: GROWING THE CURATION COMMUNITY ONE INSTITUTION AT A TIME Mac. Kenzie Smith Associate Director for Technology, MIT Libraries Science Commons Research Fellow, Creative Commons December 2010 6 th International Digital Curation Conference ©MIT
Chapter 1 Data Curation as a Metadiscipline December 2010 6 th International Digital Curation Conference ©MIT
Media Ecology 1960 s: Mc. Luhan defines Media Ecology to describe “how our interaction with media facilitates or impedes our chances of survival. The word ecology implies the study of environments: their structure, content, and impact on people. ” Neil Postman, 1971 December 2010 6 th International Digital Curation Conference ©MIT
The Nature of Information (as applied to research data) Information {Data} Has Five Properties Cha nge in s Form new understanding Magnitude Information overload Velocity instant feedback Direction Access new relationships Nystrom, C. (1973) “Towards a Science of Media Ecology: The Formulation of Integrated Conceptual. Paradigms for the Study of Human Communication Systems, ” unpublished doctoral dissertation December 2010 6 th International Digital Curation Conference ©MIT
The Rise of Interdisciplinarity “One of the consequences of the change conception of knowledge] [to our modern is a movement away from the rigidly compartmentalized, uncoordinated specialization in scientific inquiry which characterized the Newtonian world, and a movement toward increasing integration of both the physical and the social sciences. ” Nystrom, C. (1973) ibid December 2010 6 th International Digital Curation Conference ©MIT
The Rise of Interdisciplinarity “One of the symptoms of this trend is the proliferation, in recent years, of “compound” disciplines such as mathematical biochemistry, psychobiology, linguistic anthropology, psycholinguistics, and so on. ” December 2010 6 th International Digital Curation Conference ©MIT
The Rise of Interdisciplinarity Media Ecology as a “Metadiscipline” Is Data Curation driven by interdisciplinarity? Data Ecology as a new metadiscipline? December 2010 6 th International Digital Curation Conference ©MIT
Chapter 2 Goals and Challenges of Data Curation December 2010 6 th International Digital Curation Conference ©MIT
What are the Goals of Data Curation? To meet our obligation to our research community, our funders, the public � Reproducing results � Reusing data in new contexts (e. g. new tool) � Aggregating data for new research �Compiled/derived databases �Computational sciences �Interdisciplinary research December 2010 6 th International Digital Curation Conference ©MIT
What are the Functions of Data Curation? � Finding it � Making sense of it � Using it � Aggregating it � Publishing it � Referencing it � Preserving it December 2010 6 th International Digital Curation Conference ©MIT
Finding it: Centralized vs Distributed Collections alexdecarvalho http: //www. flickr. com/photos/adc/ December 2010 6 th International Digital Curation Conference ©MIT
Making sense of it: Provenance � Methodology � Semantics � Authorship � Peer review � Changes over time Gap between ability and motivation to provide this December 2010 6 th International Digital Curation Conference ©MIT
Working with it: Tools � Analyze � Visualize � Reproduce results Very little being done to catalog, archive, and provide access to data tools December 2010 6 th International Digital Curation Conference ©MIT
Aggregating it: Encoding Standards � Domain conventions vary FITS in astronomy; CIF in crystallography; EML (XML) in ecology; GO (RDF) in genetics; Difficult to integrate arbitrary data � Web standards (e. g. RDF) as a syntax substrate? � Who maps from domain to generic encoding? December 2010 6 th International Digital Curation Conference ©MIT
Aggregating it: Social and Policy Norms � Intellectual Property Rights �Public Domain or Open Access vs. �Embargoed or Access Controlled �“right to preserve” principle from Blue Ribbon Task Force � Citation and credit system for data Who sets policy and socializes norms? December 2010 6 th International Digital Curation Conference ©MIT
Aggregating it: Legal Interoperability IPR and data licenses Much data not copyrightable since facts cannot be copyrighted (in the U. S. ) � UK, EU, Australia, other countries have sui generis data rights � Laws not “interoperable” without explicit direction � Big problem for international scientific collaborations and data re-purposing December 2010 6 th International Digital Curation Conference ©MIT
December 2010 6 th International Digital Curation Conference ©MIT
Publishing it: Peer review � Data Papers �citable entity not linked to a “dataset” (i. e. file) � Enhanced publications containing or linking to data � Nanopublications These are just beginning to emerge December 2010 6 th International Digital Curation Conference ©MIT
Referencing it: Credit � Data Papers in Wo. S, Scopus, Pub. Med, etc. � DOIs for Data (Data. Cite, Cross. Ref) � The conundrum of handling LOD �Attribution via URI, e. g. ORCID? �How to handle attribution stacking? Requires credit system (e. g. awards, tenure) to notice December 2010 6 th International Digital Curation Conference ©MIT
Preserving it: Curation � Bits (forensics) � Semantics (interoperable) � Pragmatic (immediately useable) Technically hard, potentially expensive. Who’s mission? Who pays? December 2010 6 th International Digital Curation Conference ©MIT
Chapter 3 The Data Curation Ecology December 2010 6 th International Digital Curation Conference ©MIT
Curation Ecology: technology view 1. Storage layer i. RODS, S 3, Palimpsest 2. Data management layer IRs, ICPSR, UK Data Archive 3. Linking (or Semantic) layer SFX, Semantic Web 4. Discovery layer Google/Google Scholar, ICPSR UI 5. Delivery layer content interaction tools, e. g. Ajax widgets like MIT Exhibit 6. Social layer my. Grid/Taverna, Kepler, VREs, VIVO December 2010 6 th International Digital Curation Conference ©MIT
Curation Ecology: functional view 1. Storage layer 2. Data management layer 3. Linking (or Semantic) layer 4. Discovery layer 5. Delivery layer 6. Social layer 7. Business layer Bit-level persistence Metadata, policies, preservation strategies Identifiers, RDF, ORE encoding Library catalogs, Web search engines, federated search ebook readers, visualization tools, streaming media servers, security and ethics collaboration tools, social networking tools, VLEs and VREs cost recovery, legal/policy frameworks, virtual organizations December 2010 6 th International Digital Curation Conference ©MIT
Curation Ecology: organizational view 1. Research Groups (individual faculty, labs, Labs and Centers) 2. Professional Societies knowledge producers/consumers (social layer) knowledge aggregators (linking layer) 3. Data Centers 4. Libraries and archives 5. Businesses (Publishers, IT companies) 6. Universities, Funders system, data storage expertise (storage, data management layers) content/data management, data linking expertise (data management layer) discovery, delivery layers business, policy layers December 2010 6 th International Digital Curation Conference ©MIT
The Data Curation Ecology? Springer, Nature, BMC, PLo. S, Wo. S APS, ACM, Sage Commons Publishers Scholarly Societies e. g. Microsoft, Oracle Mendeley, Zotero December 2010 Institutions Research Groups IT Companies institutional, disciplinary, commercial Libraries, IT Centers, Research Admin Governments, Foundations Funders Data Centers 6 th International Digital Curation Conference ©MIT
Researcher’s Role: Provision �Metadata � Rights December 2010 (provenance) (licenses, open data) 6 th International Digital Curation Conference ©MIT
Society’s Role: Collection e. g. Sage Commons “Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics. ” “The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology. Users will also be contributors that advance the knowledge base and tools through their cumulative participation. ” December 2010 6 th International Digital Curation Conference ©MIT
Publisher’s Role: Accreditation � Require data deposit to archives � Publish data journals � Manage peer review (quality control) � Provide credit for data publishing (evolution of promotion & tenure system) December 2010 6 th International Digital Curation Conference ©MIT
Data Center Role: Infrastructure �HPC �Large-scale storage �Bit-level preservation �Large-scale data operations December 2010 6 th International Digital Curation Conference ©MIT
Funders Role: Policy � Mandates �Incentives � Guidelines December 2010 6 th International Digital Curation Conference ©MIT
IT Companies Role: Tools � Analyze and visual data � Search/subset data � Store/manage data references � Data Integration December 2010 6 th International Digital Curation Conference ©MIT
Library’s Role: Stewardship � Data organization and annotation e. g. ontologies and metadata � Data archiving and preservation e. g. perpetual access Outreach and support to local researchers Gabridge, Tracy. The Last Mile: Liaison Roles in Curating Science and Engineering Data. Research Library Issues, no. 265 (Aug 2009). http: //www. arl. org/bm~doc/rli-265 -gabridge. pdf December 2010 6 th International Digital Curation Conference ©MIT
Chapter 4 Library case studies Lesson learned December 2010 6 th International Digital Curation Conference ©MIT
Institutional/Library Data Curation December 2010 6 th International Digital Curation Conference ©MIT
University of Chicago Case Study December 2010 6 th International Digital Curation Conference ©MIT
Sloan Digital Sky Survey Data � Managed by the Astrophysical Research Consortium (ARC) � SDSS Project planned for data hand-off, recruited U of C Library � $700 k budget over 5 years; library costs were ~$150 (storage, servers, processing) December 2010 6 th International Digital Curation Conference ©MIT
Sloan Digital Sky Survey Data �Permanent archive of the data (DAS) �Serving the data to the public (CAS) �Help desk management �Preserving administrative records December 2010 6 th International Digital Curation Conference ©MIT
Digital Archive Server � FITS images, spectra files, catalog tables �~75 Tb of flat files �Support for lookup, download � Preservation plan in development �Chicago and Johns Hopkins are mirror sites �Chicago committed to preserve in perpetuity December 2010 6 th International Digital Curation Conference ©MIT
Catalog Archive Server � Search access to processed data � 20 Tb table data �SQL Server with Web UI � Library hosts, minimal preservation plans �Requires significant domain expertise �Ongoing ARC community effort December 2010 6 th International Digital Curation Conference ©MIT
User Support � Help desk �Referral system to domain expert network �Question categories: CAS, DAS, Photometry, Astrometry, Spectroscopy, Publications permissions and policy, Education exercise �Partially automated, partially intermediated (goal is 100% automation) December 2010 6 th International Digital Curation Conference ©MIT
Records Management � Library’s Special Collections and Records Management �Appraised project records, selected print and digital project records, including email archives, websites, procedural manuals, documents, databases (e. g. GNATS db) �Transferred, accessioned and processed records using standard archival principles December 2010 6 th International Digital Curation Conference ©MIT
MIT Case Study December 2010 6 th International Digital Curation Conference ©MIT
Libraries and Data Established curation for some data types statistical (Harvard-MIT Data Center) geospatial (Geodata Repository) bioinformatics (via NLMNCBI) digital media (e. g. images, videos) general datasets (IR digital archive) December 2010 6 th International Digital Curation Conference ©MIT
December 2010 6 th International Digital Curation Conference ©MIT
Libraries and Data Applies to both faculty-authored and externally-acquired data � Consultation services (in-person, via Website) � Liaise with domain data centers (e. g. ICPSR) � Develop (meta)data standards (e. g. DDI) � Manage and preserve data December 2010 6 th International Digital Curation Conference ©MIT
6 th International Digital Curation Conference ©MIT
Robotics Data in DSpace@MIT The Library: �Defined local taxonomy for metadata values �Customized metadata records �Adapted/simplified deposit workflow �Loaded data from previous repository �Added CC 0 licenses Review of new deposits done by researchers December 2010 6 th International Digital Curation Conference ©MIT
December 2010 6 th International Digital Curation Conference ©MIT
Neuroimaging Case Study December 2010 6 th International Digital Curation Conference ©MIT
Neuroimaging Case Study Sources: Brain & Cognitive Science Department; Mc. Govern Institute for Brain Research; Martinos Imaging Center; Research Lab of Electronics Digital images (MRIs, DTIs, VBM, etc. ) combined with phenotype and protocol data, genomic data, EEGs, etc. � � Large-scale>10 Tb per year for one group of 4 faculty � Expensive machine) � Hard to find, interpret no standard way to annotate images for sharing, reuse December 2010 each subject ~$1000 (1500/year, per 6 th International Digital Curation Conference ©MIT
Biological Oceanography Case Study Temperature versus salinity (T-S) relations for the North Pacific Subtropical Gyre at station ALOHA December 2010 6 th International Digital Curation Conference ©MIT
Biological Oceanography Case Study Sources: Civil Engineering; Biological Engineering; Earth, Atmospheric and Planetary Sciences � Metagenomics data combined with biochemical sensor data (water chemistry, optical properties, physical data (e. g. location) � Large-scale. Solexa sequencer produces 1 Tb per run X 2 -3 runs/week � Irreplaceable time dependent, not fully analyzable today � Need to collaborate no integrated DB exists (e. g. Gen. Bank only takes sequences) December 2010 6 th International Digital Curation Conference ©MIT
Chapter 5 The Data Ecology Revisited December 2010 6 th International Digital Curation Conference ©MIT
The Data Curation Ecology? Springer, Nature, BMC, PLo. S, Wo. S APS, ACM, Sage Commons Publishers Scholarly Societies e. g. Microsoft, Oracle Mendeley, Zotero December 2010 Institutions Research Groups IT Companies institutional, disciplinary, commercial Libraries, IT Centers, Research Admin Governments, Foundations Funders Data Centers 6 th International Digital Curation Conference ©MIT
The Institution’s Role in the Data Ecology � Policy � Incentives � Financial December 2010 sustainability 6 th International Digital Curation Conference ©MIT
The Library’s Role in the Data Ecology � Key role in �defining the service model for data archiving �Providing outreach and support to researchers �Modeling data and metadata “ontologies” �Preserving data over long time frames � In collaborationwith �central IT department to manage system, storage, technical support; �domain experts for metadata and preservation goals December 2010 6 th International Digital Curation Conference ©MIT
The Library’s Role in the Data Ecology Provide Outreach, Support, Education • • Funder/university policy, legal issues File management (naming, backup, directories) Annotation (metadata, research protocol) Long-term archiving (digital preservation) Data sharing and citation Data integration support Data Management Plans December 2010 6 th International Digital Curation Conference ©MIT
The Library’s Role in the Data Ecology Why libraries? �Libraries are interdisciplinary, large-scale �Libraries know information �Libraries like supporting researchers Librarians are a keystone species in information ecologies Nardi, Bonnie; O’Day, V. (1999). Information Ecology: Using Technology with Heart. Cambridge: MIT Press. pp. 288. http: //firstmonday. org/htbin/cgiwrap/bin/ojs/index. php/fm/article/view/672/582 December 2010 6 th International Digital Curation Conference ©MIT
Lessons Learned (The End) � Embrace interdisciplinarity � Examine mission, strength of each “species” organization (in addition to roles, skill sets) Work towards a shared definition of the data curation ecology December 2010 6 th International Digital Curation Conference ©MIT
- ____ is a sequential action in data curation
- "data curation"
- "data curation"
- Functions of art meaning
- Digital curation lifecycle
- Digital curation centre
- Sonia liou
- Hát kết hợp bộ gõ cơ thể
- Ng-html
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Voi kéo gỗ như thế nào
- Glasgow thang điểm
- Bài hát chúa yêu trần thế alleluia
- Các môn thể thao bắt đầu bằng tiếng bóng
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tính độ biến thiên đông lượng
- Trời xanh đây là của chúng ta thể thơ
- Cách giải mật thư tọa độ
- Phép trừ bù
- Phản ứng thế ankan
- Các châu lục và đại dương trên thế giới
- Thể thơ truyền thống
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Bàn tay mà dây bẩn
- Vẽ hình chiếu vuông góc của vật thể sau
- Biện pháp chống mỏi cơ
- đặc điểm cơ thể của người tối cổ
- Thứ tự các dấu thăng giáng ở hóa biểu
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Fecboak
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Dot
- So nguyen to
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan
- ưu thế lai là gì
- Khi nào hổ con có thể sống độc lập
- Khi nào hổ mẹ dạy hổ con săn mồi
- Hệ hô hấp
- Từ ngữ thể hiện lòng nhân hậu
- Thế nào là mạng điện lắp đặt kiểu nổi
- Unifida
- Managing data resources
- Data resources in information system
- Managing test data
- John williams mit
- Types of growing media
- Lettuce growth stages
- Growing perpetuity formula
- Growing up in the indus valley
- Rock candy hypothesis
- Hawaii plant hardiness zone