Chem Spider as a Platform for Crowd Participation
- Slides: 63
Chem. Spider as a Platform for Crowd Participation in Curating Chemistry Antony Williams IDCC, Chicago, December 2010
WARNING: Chemistry is Dangerous
Di-Hydrogen Monoxide
Di-Hydrogen Monoxide 2 H
Di-Hydrogen Monoxide 2 H + 1 O
Di-Hydrogen Monoxide H 2 O
Di-Hydrogen Monoxide H 2 O Water
It’s all on Wikipedia…
Chemistry on the Internet – Not All Bad § 100 s of websites hosting chemistry-related data § Chemistry information is generally “compound-based” § Chemical “structures” § Identifiers, names and synonyms § Properties § Analytical data § How to synthesize § Articles, patents, safety information § Chemistry “language and dialects”
Dialects describing chemicals
A Pragmatic Vision “Build a Structure Centric Community” § Integrate chemistry across the internet based on “chemical structure” § A “structure-based hub” to information and data § Let chemists contribute their own data § Allow the community to curate & annotate data
www. chemspider. com
Answering Questions for Chemists § Questions a chemist might ask… § What is the melting point of n-heptanol? § What is the chemical structure of Xanax? § Chemically, what is phenolphthalein? § What are the stereocenters of cholesterol? § Where can I find publications about xylene? § What are the different trade names for Aspirin? § What is the NMR spectrum of Benzoic Acid? § What are the safety handling issues for toluene?
Search for a Chemical…by name
Available Information… § Linked to chemical vendors, safety data, toxicity, metabolism…
Available Information….
Chem. Spider Today § § Almost 25 million unique chemicals Over 400 data sources Grows daily – community and RSC depositions Community annotation and curation § We curate, edit, change, enhance data daily
Three Years of Experience § Internet-based chemistry is a mess! § Public compound databases are contaminated § The annotation/curation of data online is difficult § Most database hosts are non-responsive to feedback – “We are a host/repository of data” § Who cares?
Linked Data on the Web
Where is chemistry online? § § § § § Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
What is the Structure of Vitamin K?
Me. SH – Medical Subject Headings § Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).
What is the Structure of Vitamin K 1?
What is the Structure of Vitamin K 1?
Chemical Abstracts “Common Chemistry” Database
Wikipedia WRONG
WRONG
Incorrect Structures WRONG
Lack of Stereochemistry WRONG
Does stereochemistry matter? § Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide
WRONG
Pub. Chem
WRONG
WRONG
What’s Methane?
What’s Methane?
What ELSE is Methane? ? ?
Internet-Based Chemistry is a Mess § Algorithms can get you so far § Human curation is necessary § Only the crowds can help with big data… Chem. Spider is approaching 25 million compounds
Search “Vitamin H”
Search “Vitamin H”
“Curate” Identifiers
“Curate” Identifiers
“Curate” Identifiers
Crowd-sourcing Chemistry Curation § Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
“Curate” Identifiers § General curation activities § Remove incorrect names § Correct spellings § Add multilingual names § Add alternative names § In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually § 130 people have participated in validation or annotation. “Crowds” can be quite small!
Crowdsourcing Works § The “crowd” has deposited data (structures, spectra, etc) and participated in data curation § Different level curators check each others work § Wikipedia is the modern primary example § Some curators are “madmen”…
Crowdsourcing Works § The “crowd” has deposited data (structures, spectra, etc) and participated in data curation § Different level curators check each others work § Wikipedia is the modern primary example § Some curators are “madmen”… § The Oxford English Dictionary
Vancomycin – Curate This!!!
Vancomycin on Chem. Spider 1 compound – 3 days
Crowdsourced “Annotations” § Users can add § Descriptions/Syntheses/Commentaries § Links to articles § Spectral data § Photos § MP 3 files § Videos
Multimedia Content Holder
Gaming for Curation of Spectra
Chem. Spider Everywhere Crowdsourced Curation of Spectra
Data Curation
True Curation of Data
Chem. Spider Synthetic. Pages
CAS Com. Ch. EBI Chem. Spider Chem. IDPlus Daily. Med Drug. Bank Pub. Chem Wikipedia Drug Name Generic Name Tiotropium No Hits Spiriva Bromide Depakote Valproate semisodium Basen Voglibose Symbicort 1) Budesonide ü ü ü Symbicort 2) Formoterol WRONG Vytorin 1) Ezetimibe Vytorin 2) Simvastatin Taxol Paclitaxel Thalidomide Zocor Simvastatin Crestor Rosuvastatin ü ü ü No Hits ü ü û û ü ü ü No Hits ü ü No Hits û ü û ü ü No Hits ü ü ü ü û ü ü û û 4/0 ü ü No Structure 2/1 ü ü û ü ü 8/1 6/1 ü 2/1 44/1 ü 2/1
Sharing Our Activities § Presently defining approaches with other public compound databases to share results of curation activities § Member of large European project to link data from the Life Sciences. Sharing results of curation is essential § Making curation and contribution interfaces Mobile
Mobile Chem. Spider
First request to Database Hosts! § Every public compound database host should add ONE feature – “Leave Comments”
Second request to Database Hosts! Show Comments
Question Quality
Thank you Email: williamsa@rsc. org Twitter: Chem. Connector Blog: www. chemspider. com/blog Personal Blog: www. chemconnector. com SLIDES: www. slideshare. net/Antony. Williams
- Simbol lrs
- Yellow sac spider map
- Machine platform crowd summary
- Many crowd the savior's kingdom
- Crowd strategy
- Peanut crunching crowd
- Gold crowd
- Bias through statistics and crowd counts
- Crowd control training for security
- Crip gangs
- Propagandadefinition
- Fickle crowd
- Crowd computing projects
- Spectator causal crowds
- The hunger games discussion questions
- Examples of conventional crowds
- Crowd scalability
- Bias through statistics and crowd counts examples
- "sales strategy consulting"
- Crowd vs clique
- Crowd control management plan
- Conventional crowd
- Crowd control betekenis
- Crowd strategy
- Strategy crowd
- Bias through statistics and crowd counts examples
- Harmonic crowd breaker
- Borstål, egenskaper
- Tack för att ni har lyssnat
- Shaktismen
- Cks
- Typiska drag för en novell
- Inköpsprocessen steg för steg
- Påbyggnader för flakfordon
- Jag har nigit för nymånens skära
- Strategi för svensk viltförvaltning
- Sura för anatom
- Egg för emanuel
- Fr formel
- Rutin för avvikelsehantering
- Klassificeringsstruktur för kommunala verksamheter
- Myndigheten för delaktighet
- Läkarutlåtande för livränta
- Tack för att ni lyssnade
- Att skriva en debattartikel
- Tobinskatten för och nackdelar
- En lathund för arbete med kontinuitetshantering
- Tack för att ni har lyssnat
- Meios steg för steg
- Programskede byggprocessen
- Nationell inriktning för artificiell intelligens
- Rbk fuktmätning
- Lufttryck formel
- Presentera för publik crossboss
- Kung som dog 1611
- Densitet vatten
- Elektronik för barn
- Tack för att ni har lyssnat
- Smärtskolan kunskap för livet
- Mall för referat
- Stig karttecken
- Mjälthilus
- Frgar
- Autokratiskt ledarskap