Gibbs biclustering of microarray data Yves Moreau CBS
Gibbs biclustering of microarray data Yves Moreau CBS Microarray Course
From genome projects to transcriptome projects n n Microarray cost per expression measurement Budgets and expertise Publicly available microarray data Need for exchange standards & repositories n Big consortia set up big microarray projects Genome projects “transcriptome” projects (= compendia) n Change in microarray projects ( sequence analysis) n n n Analyze public data first to generate an hypothesis Design and perform your own microarray experiment 6/5/2021 CBS Microarray Course 2
Why biclustering? n Data becomes more heterogeneous n Gene clustering n n Group genes that behave similarly over all conditions Gene biclustering n n n 6/5/2021 Group genes that behave similarly over a subset of conditions “Feature selection” More suitable for heterogeneous compendium CBS Microarray Course 3
Bicluster n Discretized microarray data set High Medium Low genes n Discretizing microarray data n n Microarray data is continuous Discretize by equal frequency Distribution of expression values for a given gene conditions 6/5/2021 CBS Microarray Course 4
Bicluster 6/5/2021 CBS Microarray Course 5
Likelihood 1 0 Background 6/5/2021 CBS Microarray Course Pattern 6
1 Likelihood 0 . 9. 9. 9 . 9 . 9. 05. 9 . 9 . 9. 9. 9 . 9 . 05. 9. 9 . 9 . 9. 9. 9 . 05 6/5/2021 CBS Microarray Course 7
1 Likelihood 0 Get the right genes . 9. 05 . 9 . 05. 9. 9 . 05 . 05. 05 . 05. 9 . 05 6/5/2021 CBS Microarray Course 8
1 Likelihood 0 Get the right conditions . 9. 9 . 05. 9 . 9. 05 . 9. 9 . 9. 9 . 05. 9 . 9. 9 . 05. 05 6/5/2021 CBS Microarray Course 9
1 Likelihood 0 Get the right frequency pattern . 6. 6. 2 . 6 . 6. 2. 2 . 6 . 6. 6. 2 . 6 . 2. 6. 2 . 6 . 2. 6. 2 . 2 6/5/2021 CBS Microarray Course 10
Optimizing the bicluster n Find the right bicluster n n Genes Conditions Pattern For a given choice of genes and conditions, the “best” pattern is given by the frequencies found in the extracted pattern n No more need to optimize over the pattern n Maximum likelihood: find genes and conditions that maximize n Gibbs sampling: find genes and conditions that optimize 6/5/2021 CBS Microarray Course 11
Gibbs sampling Current configuration 6/5/2021 Next gene configuration CBS Microarray Course 12
Updated gene configuration Next complete configuration iterate many times 6/5/2021 CBS Microarray Course 13
Gibbs biclustering 6/5/2021 CBS Microarray Course 14
Simulated data 6/5/2021 CBS Microarray Course 15
Remarks n Gibbs biclustering allows noisy patterns Optimized configuration is obtained by averaging successive iterated configurations n Biclustering is oriented n n Find subset of samples for which a subset of genes is consistenly expressed across genes Find subset of genes that are consistently expressed across a subset of samples Searching for multiple patterns n n n For gene biclustering, remove the data of the genes from the current bicluster Search for a new pattern Stop if only empty pattern repeatedly found 6/5/2021 CBS Microarray Course 16
Multiple biclusters 6/5/2021 CBS Microarray Course 17
Leukemia fingerprints 6/5/2021 CBS Microarray Course 18
Mixed-Lineage Leukemia n Armstrong et al. , Nature Genetics, 2002 n Mixed-Lineage Leukemia (MLL) is a subtype of ALL n n Caused by chromosomal rearrangement in MLL gene Poorer prognosis than ALL n Microarray analysis shows that MLL is distinct from ALL n FLT 3 tyrosine kinase distinguishes most strongly between MLL, ALL, and AML n Candidate drug target 6/5/2021 CBS Microarray Course 19
n PCA 6/5/2021 Features CBS Microarray Course 20
Biclustering leukemia data n Bicluster patients n n Discovery set n n Find patients for which a subset of genes has a consistent expression profile across this group of patients 21 ALL, 17 MLL, 25 AML Validation set n 3 ALL, 3 MLL, 3 AML 6/5/2021 CBS Microarray Course 21
Discovering ALL n Bicluster 1: 18 out of 21 ALL patients 6/5/2021 CBS Microarray Course 22
Discovering MLL n Bicluster 2: 14 out of 17 MLL patients 6/5/2021 CBS Microarray Course 23
Discovering AML n Bicluster 3: 19 out of 25 AML patients 6/5/2021 CBS Microarray Course 24
Rescoring ALL 6/5/2021 CBS Microarray Course 25
Rescoring MLL 6/5/2021 CBS Microarray Course 26
Rescoring AML 6/5/2021 CBS Microarray Course 27
- Slides: 27