Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics University of Maryland - Baltimore County
Bioinformatics: A View from the Trenches
Some Needed Developments: Simultaneous data mining of databases • Different types of information in separate databases Gen. Bank, PDB, HIV-Web, Pub. Med, … Data selection Generic solution
Some Needed Developments: Simultaneous data mining of databases • Same information in different databases Meta-analysis e. g. Gene expression data Pre-processing different technologies sources of variability
Some Needed Developments: Data mining of heterogeneous databases Many different types of information in same database e. g. Patient records - diagnostics lab results, DNA, microarray 2 D gel images data compression features
Some Needed Developments: New Algorithms • Molecular evolution Phylogenetic reconstruction Large number of sequences Statistical evolutionary models MCMC, E-M algorithm Parallel processors Emerging models
Some Needed Developments: New Algorithms • Proteomics images of 2 D gels clean up, alignment group composite image biological vs. experimental variability easily updated
Some Needed Developments: New Algorithms • Functional genomics microarray data background estimation (subjectivity) automation of analytical protocols
Some Challenges • Public domain software • Easily implementation on any computing platform • Incorporation of state-of-the-art statistical techniques clustering, classification longitudinal models spatio-temporel models