Max Planck Institute for Molecular Genetics A pipeline

  • Slides: 19
Download presentation
Max Planck Institute for Molecular Genetics A pipeline based on multivariate correspondence analysis with

Max Planck Institute for Molecular Genetics A pipeline based on multivariate correspondence analysis with supplementary variables for cancer genomics Christine Steinhoff Max Planck Institute for Molecular Genetics Berlin, Germany

Information Source Literature/ database • DNA/Genome In silico • RNA Profiling/ characterizing • Protein

Information Source Literature/ database • DNA/Genome In silico • RNA Profiling/ characterizing • Protein • Phenotype experimental Data Sources Biological Level Technology Examples ESTlibrary; physical parameters of DNA, RNA, Proteins, etc; DNA sequence, datamining, literature mining, . . . Methylation prediction: TFBS prediction; functional annotations (repetitive elements, functional categories, . . . ), Splicing, Epigenetics; SNP arrays, array. CGH; sequencing; expression arrays; . . . interaction Ch. IP chip; Preotein interaction; MASS of complexes; . . . phenomics Imaging; RNAi techniques; MASS; medical observations Max Planck Institute for Molecular Genetics

PROBLEMS grade stage Died 2 1 Yes 4 3 No 2 2 yes Cat

PROBLEMS grade stage Died 2 1 Yes 4 3 No 2 2 yes Cat (m, c) After appropriate normalization Approx lognormal symmetric Not symmetric skew Scale and Distribution differ! Max Planck Institute for Molecular Genetics Discrete categories

Procedure Data INPUT Discretization Filtering Indicator coding Multiple Correspondence Analysis Max Planck Institute for

Procedure Data INPUT Discretization Filtering Indicator coding Multiple Correspondence Analysis Max Planck Institute for Molecular Genetics

Step 1: Discretization Expression array. CGH Patients covariates Categorical: e. g. Staging Grading Smoking

Step 1: Discretization Expression array. CGH Patients covariates Categorical: e. g. Staging Grading Smoking Mutation. . Max Planck Institute for Molecular Genetics

Step 1: Discretization Expression array. CGH Package: DNAcopy Probability of expression Fold Change Criterion

Step 1: Discretization Expression array. CGH Package: DNAcopy Probability of expression Fold Change Criterion Segmentation and discretization of array. CGH data Max Planck Institute for Molecular Genetics

Step 1: Discretization Expression array. CGH Typically: n~23, 000 -> reduce number Max Planck

Step 1: Discretization Expression array. CGH Typically: n~23, 000 -> reduce number Max Planck Institute for Molecular Genetics Patients covariates

Step 2: Filtering (optional) Possibilities -Neglect all genes with no change in any patient

Step 2: Filtering (optional) Possibilities -Neglect all genes with no change in any patient -Choose genes with highest Variance across patients -Select for high Correlation between array. CGH and expression Max Planck Institute for Molecular Genetics

Procedure Data INPUT Discretization Filtering Indicator coding Multiple Correspondence Analysis Max Planck Institute for

Procedure Data INPUT Discretization Filtering Indicator coding Multiple Correspondence Analysis Max Planck Institute for Molecular Genetics

Step 3: Indicator Matrix - Binary Coding down normal Up pat 1 1 0

Step 3: Indicator Matrix - Binary Coding down normal Up pat 1 1 0 0 pat 2 0 0 1 pat 3 0 1 0 pat 4 0 1 0 Gene 1 pat 1 Down pat 2 Up pat 3 Normal pat 4 normal Original matrix With categories Max Planck Institute for Molecular Genetics Indicator matrix With binary coding

From: Multiple Correspondence Analysis and related Methods Max Planck Institute for Molecular Genetics

From: Multiple Correspondence Analysis and related Methods Max Planck Institute for Molecular Genetics

EXAMPLE: PUBLISHED DATA Max Planck Institute for Molecular Genetics

EXAMPLE: PUBLISHED DATA Max Planck Institute for Molecular Genetics

Covariate States‘ Display Max Planck Institute for Molecular Genetics

Covariate States‘ Display Max Planck Institute for Molecular Genetics

Explore ERBB 2 and MYC ERBB 2 Amplified in ACGH ERBB 2 overexpression ERBB

Explore ERBB 2 and MYC ERBB 2 Amplified in ACGH ERBB 2 overexpression ERBB 2 normal in ACGH Max Planck Institute for Molecular Genetics

ERBB 2 underexpr ERBB 2 loss in ACGH Max Planck Institute for Molecular Genetics

ERBB 2 underexpr ERBB 2 loss in ACGH Max Planck Institute for Molecular Genetics

MYC amplification MYC Overexpression Max Planck Institute for Molecular Genetics

MYC amplification MYC Overexpression Max Planck Institute for Molecular Genetics

MYC underexpression MYC Normal acgh Max Planck Institute for Molecular Genetics

MYC underexpression MYC Normal acgh Max Planck Institute for Molecular Genetics

Enrichment of GO Categories Max Planck Institute for Molecular Genetics

Enrichment of GO Categories Max Planck Institute for Molecular Genetics

Thank you for your attention ! ACKNOWLEDGEMENT Max Planck Institute for Molecular Genetics Martin

Thank you for your attention ! ACKNOWLEDGEMENT Max Planck Institute for Molecular Genetics Martin Vingron Sensor Lab, CNR-INFM Matteo Pardo Max Planck Institute for Molecular Genetics