Microarray Data Set o The microarray data set

  • Slides: 6
Download presentation
Microarray Data Set o The microarray data set we are dealing with is represented

Microarray Data Set o The microarray data set we are dealing with is represented as a 2 d numerical array.

Characteristics of Microarray Data o High dimensionality of gene space, low dimensionality of sample

Characteristics of Microarray Data o High dimensionality of gene space, low dimensionality of sample space. n o Features (genes) correlation. n o Thousands to tens of thousands of genes, tens to hundreds of samples. Genes collaborate to function. Gene correlation characterizes how the system works. A plethora of domain knowledge. n Tons of knowledge accumulated about genes in question.

Microarray Data Analysis Ø Analysis from two angles q sample as object, gene as

Microarray Data Analysis Ø Analysis from two angles q sample as object, gene as attribute q gene as object, sample/condition as attribute

Supervised Analysis q q Select training samples (hold out…) Sort genes (t-test, ranking…) Select

Supervised Analysis q q Select training samples (hold out…) Sort genes (t-test, ranking…) Select informative genes (top 50 ~ 200) Cluster based on informative genes Class 1 Class 2 g 1 1 1 … 1 0 0 … 0 g 2 1 1 … 1 0 0 … 0. . . . g 4131 0 0 … 0 1 1 … 1 g 4132 0 0 … 0 1 1 … 1 g 1 1 1 … 1 0 0 … 0 g 2 1 1 … 1 0 0 … 0. . . g 4131 0 0 … 0 1 1 … 1 g 4132 0 0 … 0 1 1 … 1

Phenotype Structure Mining samples Informative Genes 1 2 3 4 5 6 7 8

Phenotype Structure Mining samples Informative Genes 1 2 3 4 5 6 7 8 9 10 gene 1 gene 2 gene 3 gene 4 Noninformative Genes gene 5 gene 6 gene 7 An informative gene is a gene which manifests samples' phenotype distinction. Phenotype structure: sample partition + informative genes.

Existing Feature Selection and Extraction Algorithms o The characteristic of microarray data set makes

Existing Feature Selection and Extraction Algorithms o The characteristic of microarray data set makes feature selection a critical process. n o Too many features, too few samples. Existing feature selection/extraction algorithms include: n n Single gene based discriminative scores, such as t-test score, S 2 N, etc. Redundancy removal based FSS algorithms. General feature selection algorithms. (Relief family, Float selection, etc. ). General feature extraction algorithms: PCA, SVD, FLD etc. Haven’t witnessed specific feature extraction algorithms.