STRING Protein networks from data and text mining

STRING Protein networks from data and text mining Lars Juhl Jensen

9. 6 million proteins

functional associations

guilt by association

genomic context

gene fusion

Korbel et al. , Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al. , Nature Biotechnology, 2004

experimental data

gene coexpression

physical interactions

Jensen & Bork, Science, 2008

curated knowledge

protein complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

hard work

parsers

mapping files

quality scores

von Mering et al. , Nucleic Acids Research, 2005

score calibration

von Mering et al. , Nucleic Acids Research, 2005

implicit weighting by quality

common scale

missing most of the data

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDC 2

orthographic variation

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

prefixes and suffixes

CDC 2

h. Cdc 2

“black list”

SDS

co-mentioning

counting

within documents

within paragraphs

within sentences

quality scores

score calibration

integration

visualization

string-db. org Szklarczyk et al. , Nucleic Acids Research, 2015

web resource

download files

REST API

Bioconductor package

Cytoscape App

protein query

disease query

Acknowledgments Damian Szklarczyk John "Scooter" Morris Helen Cook Michael Kuhn Stefan Wyder Milan Simonovic

Slides: 81

Download presentation

STRING Protein networks from data and text mining Lars Juhl Jensen

STRING Protein networks from data and text mining Lars Juhl Jensen

9. 6 million proteins

9. 6 million proteins

functional associations

functional associations

guilt by association

guilt by association

genomic context

genomic context

gene fusion

gene fusion

Korbel et al. , Nature Biotechnology, 2004

Korbel et al. , Nature Biotechnology, 2004

phylogenetic profiles

phylogenetic profiles

Korbel et al. , Nature Biotechnology, 2004

Korbel et al. , Nature Biotechnology, 2004

experimental data

experimental data

gene coexpression

gene coexpression

physical interactions

physical interactions

Jensen & Bork, Science, 2008

Jensen & Bork, Science, 2008

curated knowledge

curated knowledge

protein complexes

protein complexes

pathways

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

many databases

different formats

different formats

different identifiers

different identifiers

variable quality

variable quality

not comparable

not comparable

hard work

hard work

parsers

parsers

mapping files

mapping files

quality scores

quality scores

von Mering et al. , Nucleic Acids Research, 2005

von Mering et al. , Nucleic Acids Research, 2005

score calibration

score calibration

von Mering et al. , Nucleic Acids Research, 2005

von Mering et al. , Nucleic Acids Research, 2005

implicit weighting by quality

implicit weighting by quality

common scale

common scale

missing most of the data

missing most of the data

>10 km

>10 km

too much to read

too much to read

computer

computer

as smart as a dog

as smart as a dog

teach it specific tricks

teach it specific tricks

named entity recognition

named entity recognition

comprehensive lexicon

comprehensive lexicon

cyclin dependent kinase 1

cyclin dependent kinase 1

CDC 2

CDC 2

orthographic variation

orthographic variation

spaces and hyphens

spaces and hyphens

cyclin dependent kinase 1

cyclin dependent kinase 1

cyclin-dependent kinase 1

cyclin-dependent kinase 1

prefixes and suffixes

prefixes and suffixes

CDC 2

CDC 2

h. Cdc 2

h. Cdc 2

“black list”

“black list”

SDS

SDS

co-mentioning

co-mentioning

counting

counting

within documents

within documents

within paragraphs

within paragraphs

within sentences

within sentences

quality scores

quality scores

score calibration

score calibration

integration

integration

visualization

visualization

string-db. org Szklarczyk et al. , Nucleic Acids Research, 2015

string-db. org Szklarczyk et al. , Nucleic Acids Research, 2015

web resource

web resource

download files

download files

REST API

REST API

Bioconductor package

Bioconductor package

Cytoscape App

Cytoscape App

protein query

protein query

disease query

disease query

Acknowledgments Damian Szklarczyk John "Scooter" Morris Helen Cook Michael Kuhn Stefan Wyder Milan Simonovic

Acknowledgments Damian Szklarczyk John "Scooter" Morris Helen Cook Michael Kuhn Stefan Wyder Milan Simonovic Alberto Santos Nadezhda Doncheva Alexander Roth Peer Bork Christian von Mering