Amplifying Data Curation Efforts to Improve the Quality
- Slides: 17
Amplifying Data Curation Efforts to Improve the Quality of Life Science Data Mariam Alqasab Suzanne Embury Computer Science Department, University of Manchester Sandra Sampaio
Introduction
Introduction
The problem DB DB Database Providers
Our suggestion DB DB Database Providers
Our suggestion DB IQBo t DB DB DB Database Providers
Our suggestion DB IQBo t DB DB - Find defects in data, DB - Find defecct corrections of defects, Database - Infer the reason behind Providers defects corrections.
How does IQBot work in general? Curated DB Monitor the curated DB periodically IQBot - Extracting defects and finding correction by comparing two consecutive versions of data. - Finding out the reason behind the defects.
Challenges in Finding a suitable DB Manually curated database. No clear vision of how they curate their data. Free access for all version of the DB.
Why Uni. Prot is fine? ! More than 500, 000 entries (manually curated) 77 millions entirs (automatically curated) 1. 2. 3. Manually curated by human experts. Curated every 4 weeks. Provides access to all its releases.
IQBot monitoring Uni. Prot monitor IQBot Extracting changes in data (protein name) Finding the reason for the change (domain specific) Before Sep 2015 From Sep 2015
Extracting Defect Corrections Version 8 Version 9
Finding the Reason for the Change Version 8 Version 9
Challenges in Using Uni. Prot Finding the reason behind the change for protein entries dated before September 2015, as we needed to investigate to find them. However, after that ECO code is provided when Informati data is been curated. on has been imp orted from anot her database
Evaluation When comparing results produced by IQBot (around 1000 protein names by monitoring 249 proteins), it showed that only 1 out of 6 databases has the most updated version of protein names.
Conclusion Currently, IQBot proved that it can identify changes made by data curators, and assign the reasons behind these changes in Uni. Prot. We are looking for another curated database to work with. If you would like to share your data with us, please contact: Mariam. alqasab@postgrad. manchester. ac. uk
- ____ is a sequential action in data curation
- "data curation"
- "data curation"
- Production process of artist and artisan
- Digital curation lifecycle
- Digital curation centre
- Joining together as a team to improve the quality
- Asset data to improve cmdbs and it systems
- Data cleaning problems and current approaches
- Data quality and data cleaning an overview
- Content analysis is a type of secondary data analysis
- Data quality and data cleaning an overview
- John kotter leading change why transformation efforts fail
- How do consumers respond to various marketing efforts
- Anti-corruption efforts
- Is advertising a noun
- Industry versus inferiority
- Causes of scarcity