Introduction to Biological Databases and Data Archiving Creating
Introduction to Biological Databases and Data Archiving Creating Archive Requirements
DEPOSITION, ANNOTATION AND RELEASE POLICIES 2
Deposition, Annotation, and Release Policies • How does one go about developing requirements for what goes into a data archive? • What data should be mandatory for every entry? • What data can be optional and who should decide? 3
What Data Should Be Included in a Data Archive? • Consult domain experts with deep understanding of the experiment and the relative importance of different data items • Archivists evaluate the practicality of collecting the data and how to best organize them • Set face-to-face workshop or task force to achieve a consensus – Follow up meetings to evaluate progress 4
Standard Annotation Procedures and Policies • Goals of standardizing annotation procedures – Produce uniform quality – Maintain archival consistency – Set curation expectation • Setting Policies – To set boundary- scope of the data to be archived – To govern data privacy- hold and release – To set requirements- minimal data for validation 5
Procedure on Setting Standard Annotation Procedures and Policies • Weekly global ww. PDB biocuration meeting – VTC, Skype, phone – consensus • Documentation – – Draft for review ww. PDB directors review Revise Approval • Post at the public website • Public announcement • Maintain and update documents as procedures and/or policies evolve 6
ww. PDB Policies • Policies evolve as the science evolves – Deposition of experimental data became mandatory as data quality assessment becomes more important – Strict size limitation on peptides is loosened for accepting biologically important small peptides • Current policies: ww. PDB. org/documentation/policy – PDB Entry Requirements – Entry Authorship – Release of PDB Entries – Assignment of PDB IDs and Ligand codes – Changes to entries 7
Case Example 1 PDB Entry Requirements • What are the requirements of acceptance of an entry to the PDB? – Must have three-dimensional atomic coordinates – Must include information about the composition of the structure (sequence, chemistry, etc. ), the experiment performed, details of the structure determination steps and author contact information – Experimental data for X-ray and NMR are required 8
Case Example 1 PDB Entry Requirements (Cont’d) • Which type of experimentally determined structures are accepted by the PDB? – Must be experimentally determined structures of biological macromolecules – Currently accepts coordinate sets produced by Xray crystallography, NMR, electron microscopy, neutron diffraction, powder diffraction, fiber diffraction – Purely in silico models are not accepted 9
Case Example 1 PDB Entry Requirements (Cont’d) • What types of structures can be deposited to the PDB? – Polypeptides and oligopeptides – Polysaccharides and oligosaccharides – Polynucleotides and oligonucleotides 10
Case Example 2 Release of PDB Entry • A journal policy of release upon publication takes precedence over the 6 -month or 1 -year hold policy. • Status codes for PDB entries – REL, HOLD, HPUB, WDRN, OBS • Deadlines for requesting release of entries – Weekly release • Phase I: Every Saturday by 3: 00 UTC: sequence(s), In. Ch. I string(s) and the crystallization p. H value(s). • Phase II: Every Wednesday by 00: 00 UTC, all new and modified data entries will be updated at each of the ww. PDB FTP sites. – Public request of release: by 12: 00 noon on Thursday (local time at processing site) 11
Case Example 2 Release of PDB Entry (Cont’d) • Experimental data and coordinate files must be released at same time. • Email addresses of authors are not publicly available and will not be distributed 12
Case Example 2 Release of PDB Entry (Cont’d) • Who has access to unreleased data? – Only authors of the particular entry – Reviewers of the paper may not obtain unreleased coordinate sets from the PDB – Reviewer can contact the journal editor to obtain the validation report from the author • What information is available for unreleased entries? – (Title), authorship, status, PDB ID, experimental data status and (sequence) 13
Case Example 3 Changes to PDB Entry • What changes can be made before release? – Coordinates – Experimental data – Related meta data information 14
Case Example 3 Changes to PDB Entry (Cont’d) • Changes that can be made after release without replacing entry – Meta data such as citation, author's name, etc. – Format corrections or addition of data set in experimental data while coordinates remain unchanged – Chain ID, residue numbering, atom name, ordering of molecules and/or ligands – Terminal sequence- add or remove a region that is unobserved in the coordinates 15
Case Example 3 Changes to PDB Entry (Cont’d) • Major changes are those that alter geometry or chemical composition of the entry – Coordinates (x, y, z) – Sequence of the polymer – Ligand identity • Require the entry to be obsoleted and superseded by a new deposition 16
Case Example 3 Changes to PDB Entry (Cont’d) • ww. PDB Remediation – The ww. PDB reviews the entire archive on a regular basis and remediates the data – The nature of the changes are described in a public document on the ww. PDB site – Individual authors are not contacted for global remediation – A version number is assigned, A REMARK with this version number and date is in every file – Older version is maintained as a snapshot on the FTP site 17
This work is licensed under Creative Commons Attribution-Non. Commercial-Share. Alike 4. 0 International. Funded by Grant R 25 LM 012286 from the National Library of Medicine of the National Institutes of Health. 18
- Slides: 18