DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS Cross Matching
DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS Cross Matching : Alignment of Astronomy Catalogs Tuple ID P 1 P 2 P 3 Join Attribute (X) Catalog P Ø Astronomy Sky Surveys (SDSS , 2 MASS) Ø Observes Galaxies, Quasars, Stars Serendipity Objects Ø Raw Data from Telescope is pre-processed X 1 X 2 X 3 A A 1 A 2 Join Attribute (X) X 1 X 2 X 3 Tuple ID Q 1 Q 2 Q 3 Q 4 A A 1 A 2 A 2 B B 3 B 2 B 4 B 1 Join Attribute (X) B X 3 X 2 X 1 B 2 B 4 B 3 Catalog Q The Matched Catalog Ø Hundreds of attributes for each object Ø National Virtual Observatory - Develop an information technology infrastructure for enabling easy access to distributed astronomy catalogs Mass / Luminosity / Radius 1. Data Matrix: Site A - n X p , Site B – n X q 2. p + q = m (total number of attributes) 3. Normalize the data at respective sites without any communication 4. A central co-ordination site S sends A and B a random number generation seed 5. A and B generate a l X n random matrix R (elements of the random matrix are i. i. d and chosen from any distribution with mean 0 and variance 1) 6. A sends RA and B sends RB to S 7. Compute D = (RA)T (RB) / l 8. E[D]= E[AT(RTR)B/ l ] = AT E[RTR] B / l ~ AT B (Johnson and Linden Strauss lemma) Velocity Dispersion Surface Brightness 1. Objective: Finding correlations in high dimensional spaces 2. Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) 3. A 2 D plane exists in the observed space of parameters called The Fundamental Plane Objective: Finding correlations in high dimensional spaces Domain Knowledge: For the class of elliptical galaxies, observe the parameters 2 MASS Mean Surface Brightness ( Kmsb) Surface Brightness, Log (Velocity Dispersion), Log (Radius) A 2 D plane exists in the observed space of parameters called The Fundamental Plane SDSS Red Shift (rs) Angular Effective Velocity Radius (Iaer) Dispersion (vd) Assumptions : 1. Build the cross matched table off-line 2. Compute indices and send to the sites Kmsb Velocity Dispersion (Angular Eff. Radius X Red Shift) The Virtual Table Work Done by Haimonti Dutta, Chris Giannella, Kirk Borne, Ran Wolff and Hillol Kargupta NSF Grants: IIS-0329143 , IIS-0093353 , IIS-0203958 and NASA Grant NAS 2 -37143
- Slides: 1