Joshua Li 1 Yuhong Ning 1 Warren Hedley

  • Slides: 1
Download presentation
Joshua Li 1, Yuhong Ning 1, Warren Hedley 1, Brian Saunders 1, Nicole Tindill

Joshua Li 1, Yuhong Ning 1, Warren Hedley 1, Brian Saunders 1, Nicole Tindill 2, Yongsheng Chen 1 Timo Hannay 2, Robert Sinkovits 1, Ron Taussig 3, Al Gilman 3, Shankar Subramaniam 1 1 Alliance For Cellular Signaling, San Diego Supercomputer Center and the University of California at San Diego 2 Nature Publishing Group 3 University of Texas Southwestern Medical Center Abstract Expert-Entered Data Molecule Pages Database The “Af. CS-NPG Molecule Pages” comprises a database and website currently being constructed at the San Diego Supercomputer Center facility on behalf of the Alliance for Cellular Signaling (Af. CS) in association with the Nature Publishing Group (NPG). This website will present to the public comprehensive information about the set of signaling proteins that have been selected for study by the Af. CS. Each protein has a dedicated “Molecule Page” where the public can go to obtain curated information about that protein, as well as summaries of information from, and links to, external database records that are related to that protein. A reference amino acid sequence is defined for each signaling protein, and the majority of the information in a Molecule Page is expected to pertain to that sequence. The reference sequence will be a mouse sequence if one is known, but may be human or rat otherwise. Information for mutant and variant sequences will also be captured within each Molecule Page. Splice variants are considered different molecules, and have their own Molecule Pages. Researchers who are experts on a molecule and its cellular function can volunteer to input data into each Molecule Page. They are expected to periodically update their Molecule Page as new information becomes available. The content of each Molecule Page undergoes an annual anonymous peer-review administered by Nature Publishing Group, after which a new version is “published” (i. e. , made available to the public). Each version of a Molecule Page will be a fully-fledged scientific publication, with entries in Pub. Med and Cross. Ref, and citable using digital object identifiers (DOIs). The expert appointed to oversee each Af. CS protein will use a private web interface to enter data into the Molecule Pages database. This author-entered data will include all information from the literature that the author believes to be correct and will be cross-referenced against the appropriate publications for the benefit of the public and the reviewers. Author-entered data will include information about a protein’s states, including interactions between the author’s protein and other protein or non-protein molecules, covalent modifications, and any localization information. “Signatures” are computed to identify each state at varying levels of resolution, allowing duplicate states to be resolved. Transitions between the states may also be input, as well as any functional consequences of each state (including kinetic parameters). Other types of information captured are mutant and variant forms, localized expression, and experimental data. Capturing information about a protein’s states and interactions in a database will enable tools to extract and model signaling networks, providing us with a better understanding of how cellular signaling works – this is the primary motivation for the project. In addition to the author-entered data, a Molecule Page will contain “automated data” extracted from the public databases or obtained via computation. Each author will be given a complete set of automated data to work with, which they can reference in their own work. New sets of automated data will be made available on a regular basis, and each time the author will have the option of selecting the new set of automated data, or staying with the set they are currently using. When the public views a Molecule Page, they will be able to see both the automated data on which the author based his/her work, as well as the very latest set of automated data associated with the protein. • Experts write an abstract for their protein, which is divided into nine sections, each of which can have specific literature references associated with it. The Af. CS Molecule Pages Database consists of over 200 tables that handle diverse requirements like: Motivation • Experts define literature-characterized "functional states" for their protein, each consisting of a complex of one or more proteins, optionally covalently modified or bound to non-protein partners, in a specific subcellular location. • Experts can associate function (e. g. , enzyme, channel, or receptor) and experimental data (e. g. , thermodynamic and kinetic parameters and detection methods) with each state. • “Class Proteins” can be used in state creation to represent a number of homologous proteins that behave similarly. Individual states can then be quickly created from the resulting “class state” to add distinguishing experimental data. The Molecule Pages database is deployed on an Oracle 9 i Database running on a multi-processor Sun server. The website is generated using a combination of servlets and Java Server Pages (JSPs) deployed on the Oracle Components 4 Java application server, which runs on another Sun server. What Makes A Molecule Page Peer Review • Regularly updated “automated” annotation based on surveys of the public databases and computational analysis. • Links to the relevant Af. CS experimental data. Af. CS ID Links To Experimental Information Versioning & Review Information Automated Information Molecule Page Version Af. CS Experimental Data Peer Review & Publication Canonical Protein Sequence Automated Annotation Inter-state Transitions Molecule States State Function Literature References Quantitative Data Expert Annotation Curated Information • Online Review. All aspects of the peer review process must be captured in the database. • Graphical representations of the transition network can be generated at any time: one containing a network with all of the states defined for a protein, and the other containing just the states one transition away from a chosen starting state. • Create a protein-centric database that will provide the basis for a signaling network database. • Peer review of the expert-entered data, with annual publications. • Complex Access Control. A Molecule Page in progress is not visible to anyone but the author, reviewers, and editor. • Experts define a network of transitions between protein states, which are broadly categorized as association, dissociation, modification, or localization, and can add information about catalysts. • Provide comprehensive literature- and Af. CS experimental data- derived, curated information pertinent to each Af. CS signaling protein. • Expert-entered data about molecules, their states, inter-state transitions, and function. • Versioning. Multiple publications of a single Molecule Page must co-exist in the database. Graphic taken from the Molecule Page for Adenylyl cyclase type 5 by Carmen Dessauer, published 15 Dec 2003. Automated Annotation The Af. CS Bioinformatics Laboratory provides automated annotation for each Af. CS protein, including: • Protein database links • Genomic database links (Ensembl) • Genbank information • Blast, with ortholog prediction • Swissprot information • Motif prediction (Prints) • Locus. Link information • Domain prediction (Pfam and Smart) • Each Molecule Page goes through a "digital" peer review process similar to the peer review process applied to scientific papers. • A team of Af. CS-appointed experts checks submitted Molecule Pages for quality and quantity. • Expert reviewers are invited by NPG staff to anonymously review each Molecule Page prior to publication. • Reviewers' comments are stored in the database and can be edited by NPG staff before being forwarded to authors who can revise their Molecule Page data. • NPG staff are ultimately responsible for publication. Links http: //www. signaling-gateway. org/ The Molecule Pages is just one component of the Signaling Gateway website, which also includes experimental data, information about Af. CS activities, and selected articles and reviews relevant to cellular signaling from the relevant Nature journals. Visitors must create an account to view the Molecule Pages, but registration is quick and free! Acknowledgements The Alliance for Cellular Signaling is funded by a "Glue Grant" from The National Institute of General Medical Sciences, and with funding from: • The National Institute of Allergy and Infectious Diseases • The National Cancer Institute • Eli Lilly and Company • The Merck Genome Research Institute • Aventis Pharmaceuticals • Johnson and Johnson • Novartis Pharma AG • The Agouron Research Institute • Anonymous Foundation, Dallas TX • University of Texas Southwestern Medical Center We would like to thank the PIs and other advisors from the UT Southwestern Medical Center for providing input and feedback to the Molecule Pages. We also acknowledge other former Molecule Pages team staff for their contributions.