Amplifying Data Curation Efforts to Improve the Quality

  • Slides: 17
Download presentation
Amplifying Data Curation Efforts to Improve the Quality of Life Science Data Mariam Alqasab

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data Mariam Alqasab Suzanne Embury Computer Science Department, University of Manchester Sandra Sampaio

Introduction

Introduction

Introduction

Introduction

The problem DB DB Database Providers

The problem DB DB Database Providers

Our suggestion DB DB Database Providers

Our suggestion DB DB Database Providers

Our suggestion DB IQBo t DB DB DB Database Providers

Our suggestion DB IQBo t DB DB DB Database Providers

Our suggestion DB IQBo t DB DB - Find defects in data, DB -

Our suggestion DB IQBo t DB DB - Find defects in data, DB - Find defecct corrections of defects, Database - Infer the reason behind Providers defects corrections.

How does IQBot work in general? Curated DB Monitor the curated DB periodically IQBot

How does IQBot work in general? Curated DB Monitor the curated DB periodically IQBot - Extracting defects and finding correction by comparing two consecutive versions of data. - Finding out the reason behind the defects.

Challenges in Finding a suitable DB Manually curated database. No clear vision of how

Challenges in Finding a suitable DB Manually curated database. No clear vision of how they curate their data. Free access for all version of the DB.

Why Uni. Prot is fine? ! More than 500, 000 entries (manually curated) 77

Why Uni. Prot is fine? ! More than 500, 000 entries (manually curated) 77 millions entirs (automatically curated) 1. 2. 3. Manually curated by human experts. Curated every 4 weeks. Provides access to all its releases.

IQBot monitoring Uni. Prot monitor IQBot Extracting changes in data (protein name) Finding the

IQBot monitoring Uni. Prot monitor IQBot Extracting changes in data (protein name) Finding the reason for the change (domain specific) Before Sep 2015 From Sep 2015

Extracting Defect Corrections Version 8 Version 9

Extracting Defect Corrections Version 8 Version 9

Finding the Reason for the Change Version 8 Version 9

Finding the Reason for the Change Version 8 Version 9

Challenges in Using Uni. Prot Finding the reason behind the change for protein entries

Challenges in Using Uni. Prot Finding the reason behind the change for protein entries dated before September 2015, as we needed to investigate to find them. However, after that ECO code is provided when Informati data is been curated. on has been imp orted from anot her database

Evaluation When comparing results produced by IQBot (around 1000 protein names by monitoring 249

Evaluation When comparing results produced by IQBot (around 1000 protein names by monitoring 249 proteins), it showed that only 1 out of 6 databases has the most updated version of protein names.

Conclusion Currently, IQBot proved that it can identify changes made by data curators, and

Conclusion Currently, IQBot proved that it can identify changes made by data curators, and assign the reasons behind these changes in Uni. Prot. We are looking for another curated database to work with. If you would like to share your data with us, please contact: Mariam. alqasab@postgrad. manchester. ac. uk