Mining for Queries Florin Rusu Mentors Vijayshankar Raman
- Slides: 22
Mining for Queries Florin Rusu Mentors: Vijayshankar Raman, Lin Qiao, Peter Haas Manager: Guy Lohman
Motivation l Hardware trends Multi-core processors l Memory capacity l l Query execution improvement [RSQ 08] Parallel, in-memory databases l Compression, scans l l Goal l Use the query power 2
Exploratory Analysis l Hypothesis-driven Online aggregation [HHW 97] l Data cube exploration l l Discovery-driven l Materialized data cube [SAM 98] l Goals l Automatic exploratory analysis 3
Cell Phone Call Data Warehouse Table Calls Latitude Longitude Call Time Call Duration Call Type Call Status 38 N 123 W 9: 20 10: 03 International OK 38 N 122 W 20: 48 20: 20 Long Drop 38 N 120 W 16: 22 0: 48 Local OK … … … 38 N 70 W 12: 29 0: 32 Cell Drop 4
Interesting Query (1) What are the time intervals for which the duration of long distance calls is significantly high? 5
Interesting Query (2) What are the areas with large fractions of dropped calls? Non-axes aligned hyper-rectangles 6
Query Pattern Example SELECT SUM(Call Duration) FROM Calls WHERE (16: 00 < Call Time < 22: 00) AND (10 < 0. 65*Latitude-0. 35*Longitude < 100) GROUP BY Call Type General pattern SELECT AGG(G) FROM T WHERE L 1 < P 1 < U 1 AND … AND Ln < Pn < Un GROUP BY G 7
Problem Given a data warehouse find the most interesting regions in the attribute space (P 1, …, Pn) according to a function F and that have sufficient support l Searching problem over hyper-rectangular regions in the attribute space l argmax. P 1, …, Pn F(P 1, …, Pn, G) COUNT(P 1, …, Pn) > S l Function F l Value one group / Average value other groups 8
Exhaustive Search l Try l all axes-aligned hyper-rectangles 10 attributes with 10 values in the domain give 1020 hyper-rectangles l Alternatives Run independent queries l Pre-computed prefix-sum array [HAMS 97] l l Top-down l search Iceberg pruning on support [BR 99] 9
Incremental Approach l Solve for each group separately and combine the results l F is assumed locally smooth l Bottom-Up search l l Find local maxima points for F Extend region around local maxima Verify function F Verify support 10
Axes-Aligned Find local maxima points for F 11
Axes-Aligned For each local maxima 12
Axes-Aligned Extend along axes 13
Axes-Aligned Maximal region 14
Principal Component Analysis l Find a new base with vectors corresponding to variance along axes in the original data l Steps Mean-center data along each dimension l Compute covariance matrix l Find eigen-vectors and eigen-values l 15
Non-Axes-Aligned Find local maxima points for F 16
Non-Axes-Aligned For each local maxima 17
Non-Axes-Aligned Extend along eigen-vectors computed by PCA 18
Non-Axes-Aligned Maximal region 19
Query Parameters l Aggregate l fraction AGG(g) / ΣAGG(G) l Support l COUNT(g) l Density l COUNT DISTINCT(g) / TOTAL POINTS 20
Implementation l Use Blink [RSQ 08] as query engine l Use queries extensively Function evaluation l Region expansion l Overlap multiple queries l 21
Questions Comments Suggestions 22
- Vijayshankar raman
- Inga rusu
- Madalina stan
- Amalia rusu
- Inga rusu
- Rusu weebly
- Inga rusu
- F. belingshauzenas ir m. lazarevas
- Sociālais mentors
- Toxic mentoring
- Prayer for mentors and mentees
- Rust mentors
- Shadow archetypes
- Jessie vandeweyer
- Mabel
- Strip mining vs open pit mining
- Multimedia data mining
- Chapter 13 mineral resources and mining worksheet answers
- Mining complex types of data in data mining
- Difference between strip mining and open pit mining
- Web text mining
- Basic ir
- For any queries