Interactive Query Processing Vijayshankar Raman Computer Science Division
- Slides: 74
Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley
Motivation: Nature of querying § querying = extracting information from data sets § different techniques in different settings § intrinsically slow § significant user-system interaction § info. seekers work iteratively, gradually refining requests based on feedback [O’day and Jeffries ‘ 93, Koenemann and Belkin ‘ 96] 2/28/2021 Interactive Query Processing 2
Problems with traditional solutions § mismatch between system functionality and iterative mode of HCI § “black box” functionality • batch processing • frustrating delays in iterative process query process exact answer Query: find average grade of students in each college 2/28/2021 Interactive Query Processing 3
Interactive processing query process exact answer § HCI requirements • users must get continual feedback on results of processing • allow users to control processing based on prior feedback 2/28/2021 Interactive Query Processing 4
results result estimates Interactive processing § HCI requirements • users must get continual feedback on results of processing • allow users to control processing based on prior feedback § performance goals • not to minimize time to give complete results • give continually improving partial results • adapt to dynamically specified performance goals 2/28/2021 Interactive Query Processing 5
Background: Traditional Database Query Processing declarative query select R. a, S. b, T. c from R, S, T where <conditions> query optimizer R § data access: scan-based ( R T query executor S query plan ) or index-based ( T ) § selection ( ) -- filter tuples based on condition § join ( ) -- apply filter on cross-product of the inputs • can use index if available (“index-join”) • else dynamically build hash-tables on inputs (“hash-join”) § query optimizer • chooses plan: operator implementations and ordering • cost model based on pre-computed summary statistics 2/28/2021 Interactive Query Processing 6
My Research § interactive query processing - assume: dataflow thru pipelined operators T 1. support for dynamic user control in traditional query proc. architectures S R 2. adaptively giving partial results in response to user control S R T 3. still more aggressive adaptation - state modules § interactive data cleaning 2/28/2021 T (Potter’s Wheel) Interactive Query Processing 7
Talk Outline § motivation and context èsupport for dynamic user control in traditional query proc. architectures § architecture for adaptively generating partial results § policy for generating partial results • user interface for displaying partial results • impact on routing § wrapup 2/28/2021 Interactive Query Processing 8
Design goals in supporting user control § make minimal change to system architecture § must be independent of particular query processing algorithms § no delay in processing 2/28/2021 Interactive Query Processing 9
Online Reordering (Raman et al. ’ 99, ’ 00) § users perceive data being processed over time • prioritize processing for “interesting” tuples • interest based on user-specified preferences § reorder dataflow so that interesting tuples go first § encapsulate reordering as pipelined dataflow operator T T R 2/28/2021 S S R Interactive Query Processing 10
Context: an application of reordering § online aggregation [Hellerstein/Haas/Wang ‘ 97, Haas/Hellerstein ‘ 99 ] • for SQL aggregate queries, give gradually improving estimates • with confidence intervals SELECT AVG(gpa) FROM students GROUP BY • allow userscollege to speed up estimate refinement for groups of interest • prioritize for processing at a per-group granularity 2/28/2021 Interactive Query Processing 11
Online Aggregation Screenshot SELECT AVG(gpa) FROM students GROUP BY college 2/28/2021 Interactive Query Processing 12
Framework for Online Reordering acddbadb. . . c b network xfer. f(t) a d c ab produce process reorder consume user interest § want no delay in processing Þ in general, reordering can only be best-effort § typically process/consume slower than produce • exploit throughput difference to reorder § two aspects • mechanism for best-effort reordering • reordering policy 2/28/2021 Interactive Query Processing 13
Juggle mechanism for reordering process/consume get. Next buffer prefetch produce enrich spool side disk § two threads -- prefetch from input -- spool/enrich from auxiliary side disk § juggle data between buffer and side disk • keep buffer full of “interesting” items • get. Next chooses best item currently on buffer § get. Next, enrich/spool decisions -- based on reordering policy § side disk management • hash index, populated in a way that postpones random I/O 2/28/2021 Interactive Query Processing 14
Reordering policies GOAL: “good” permutation of items t 1…tn to t 1…t n QOF time § quality of feedback for a prefix t 1 t 2…t k QOF(UP(t 1), UP(t 2), … UP(t k )), UP = user preference • determined by application § goodness of reordering: d. QOF/dt § implication for juggle mechanism • process gets item from buffer that increases QOF the most • juggle tries to maintain buffer with such items 2/28/2021 Interactive Query Processing 15
QOF in Online Aggregation § avg weighted confidence interval § preference acts as weight on confidence interval • QOF = UPi / ni , ni = number of tuples processed from group i Þ process pulls items from group with max UPi / ni ni Þ desired ratio of group i tuples on buffer = UPi • juggle tries to maintain this by enrich/spool 2/28/2021 Interactive Query Processing 2/3 / UPj 2/3 16
Other QOF functions § rate of processing (for a group) preference • QOF = (ni - n. UPi)2 (variance from ideal proportions) Þ process pulls items from group with max (n. UPi - ni ) Þ desired ratio of group i tuples in buffer = UPi 2/28/2021 Interactive Query Processing 17
Results: Reordering in Online Aggregation § implemented in Informix UDO server § experiments with modified TPC-D queries § questions: • how much throughput difference is needed for reordering • can we reorder handle skewed data consume SELECT AVG(o_totalprice), o_orderpriority FROM order WHERE exists ( juggle SELECT * FROM lineitem WHERE l_orderkey = o_orderkey) scan GROUP BY o_orderpriority • index-only join • 5 orderpriorities, zipf distribution 2/28/2021 Interactive Query Processing index § one stress test: skew, very small proc. cost process 18
# tuples processed Performance results time § without reordering 2/28/2021 Interactive Query Processing 19
# tuples processed Performance results time § 3 times faster for interesting groups § overhead: 2% completion time, 1 extra disk 2/28/2021 Interactive Query Processing 20
confidence interval Performance results E C A time 2/28/2021 Interactive Query Processing 21
Overall Findings § higher processing costs • index/hash join, subquery, … • reordering easy § very low processing costs • juggle constrained by density of interesting tuples • outlier groups hard to speed up • better to use index stride [Hellerstein/Haas/Wang ‘ 97] § needs pre-computed index § reordering becomes easier over time § question to answer: • where to place juggle? 2/28/2021 Interactive Query Processing 22
Outline § motivation and context § support for dynamic user control in traditional query proc. architectures èarchitecture for adaptively generating partial results § policy for generating partial results • user interface for displaying partial results • impact on routing § wrapup 2/28/2021 Interactive Query Processing 23
Incremental results Result Space § traditional arch. also generate continual result tuples • arises from continual dataflow thru pipelining operators • much work on pipelining joins [Wilschut/Apers’ 91, Haas/Hellerstein’ 99, Ives et al. ’ 99, Urhan/Franklin’ 00] § this is too rigid • especially in distributed envirorments 2/28/2021 Interactive Query Processing 24
Context: Query processing in Telegraph § Telegraph: adaptive dataflow system to query diverse, distributed sources § much data available as services over Internet • currently only accessible by browse/search/forms § want to combine this data through queries § examples: • campaign finance information (Election 2000) Federal Election Commision Yahoo Home Prices Census APBnews Crime Ratings Maps IMDB • restaurant information Switch. Board Map. Quest 2/28/2021 Fodors S. F. Chronicle Health inspection reports Interactive Query Processing 25
Partial results § complete result tuples too rigid • source latencies high, diverse • dynamic source variations, delays • query does not capture user desires § non-expert and even expert users: query too broad 2/28/2021 Interactive Query Processing 26
Partial results § complete tuples too rigid • source latencies high, diverse • dynamic source variations, delays • query does not capture user desires § non-expert and even expert users: query too broad § want to process queries flexibly • give partial result tuples asap • adapt dynamically to user preferences and source variations 2/28/2021 Interactive Query Processing 27
Correctness of partial results § some columns essential for UI • e. g. group-by columns / sort-by columns § for maximum flexibility -- outer-join semantics • good idea for Web sources § or -- strict join semantics • no partial results without ensuring match exists • key constraints helpful § aggregates: • update early, and compensate later • statistical guarantees for aggregates difficult § fanouts unknown (can be 0!) § key constraints helpful 2/28/2021 Interactive Query Processing 28
Dynamic query plans T R T S R Eddy S S T § Eddy [Avnur and Hellerstein 2000] • router for directing data thru modules • minimize completion time § adaptively choosing join order for arbitrary tuple + all partial tuples generable 2/28/2021 Interactive Query Processing 29
Partial Results in Dynamic Plans. . . modules Eddy R § S P . . . inputs my focus: continual partial results • dynamically adapt dataflow to suit user preferences § Eddy must decide a) what tuple to route next b) where to route it • based on user preferences and module properties § need routing policy and a reordering mechanism • eddy memory buffer and module queues bounded 2/28/2021 Interactive Query Processing 30
Prioritizing tuples. . . modules copy R S P. . . inputs § enhance Eddy with Juggle • reorder exploiting slowness of modules + synergy -- juggle location problem solved! • juggle everywhere, to the extent of throughput difference 2/28/2021 Interactive Query Processing 31
Routing and reordering policy GOAL: QOF time § GOAL: at any time, route to max. d. QOF/dt = benefit of sending tuple to module / cost § cost: estimate data rates to/from module § benefit: dependent on application and UI • how partial results impact the UI • user preferences 2/28/2021 Interactive Query Processing 32
Outline § motivation and context § online reordering for user prioritization of partial results § architecture for adaptively generating partial results èpolicy for generating partial results • Telegraph UI: displaying results and inferring preferences • experimental results § wrapup 2/28/2021 Interactive Query Processing 33
Telegraph UI § screenshot 2/28/2021 Interactive Query Processing 34
Getting user preferences § infer from navigation • row/column scrolling, group drill down and rollup § prioritize visible rows/columns • “query evolution” § subset “one-size-fits-all” queries § future work: query expansion § explicit • up/down buttons on columns • at the cell level -- need for some expensive sources § map to QOF metric on partial results 2/28/2021 Interactive Query Processing 35
QOF: Benefit of a partial result § depends on user preferences § benefit of updating a cell in output • row priority x column weight x cell resolution • incremental cell resolution: how much does one extra update add to the cell’s value § scalars -- 1 § aggregations -- change in confidence interval § informing user about execution progress • convey cell resolution on UI • future work 2/28/2021 Interactive Query Processing 36
QOF: Benefit of Routing a Tuple § route tuple according to expected benefit and cost t M T § benefit of sending a tuple t to a module M and forming set T = cells c T benefit of updating c 2/28/2021 Interactive Query Processing 37
Throughput difference Bush Contributors Income (index) AFB Crime Ratings (index) Number of tuples read 40000 Bush Contributors 30000 20000 10000 0 Income Crime Ratings 200 400 600 time (s) 2/28/2021 Interactive Query Processing 38
Benefit of giving partial results Bush Contributors Income (index) AFB Crime Ratings (index) 40000 ts l u es 30000 r l tia 20000 10000 Bush Contributors m u n r e b of #partial results r a p #complete results Income Crime Ratings 0 200 400 600 time (s) 2/28/2021 Interactive Query Processing 39
Effect of Delays Bush Contributors Income (index) AFB Crime Ratings (index) with delay in AFB Crime Ratings ts sul e r l a i part f o 10000 number 1000 100 lts u s e r delay ete l p m co f o r be num 10 0 200 400 600 time(s) 2/28/2021 Interactive Query Processing 40
Prioritizing particular values 2/28/2021 scroll 600 scroll # Aggr. Updates § SELECT AVG(Income) FROM Bush Donors, Census GROUP BY Bush. Donors. State § scrolling: {AZ, AR, CA, CO, CT}, {LA, KY, MA, MD, MI}, {TX, UT, VA, VI, VT} 400 200 0 100 200 300 time (s) Interactive Query Processing 400 41
Prioritizing particular values 2/28/2021 scroll 600 scroll # Aggr. Updates § SELECT AVG(Income) FROM Bush Donors, Census GROUP BY Bush. Donors. State § scrolling: {AZ, AR, CA, CO, CT}, {LA, KY, MA, MD, MI}, {TX, UT, VA, VI, VT} 400 200 0 100 200 300 time (s) Interactive Query Processing 400 42
Prioritizing particular values 2/28/2021 scroll 600 scroll # Aggr. Updates § SELECT AVG(Income) FROM Bush Donors, Census GROUP BY Bush. Donors. State § scrolling: {AZ, AR, CA, CO, CT}, {LA, KY, MA, MD, MI}, {TX, UT, VA, VI, VT} 400 200 0 100 200 300 time (s) Interactive Query Processing 400 43
# Bush Contributors Distribution of contributions 5000 4000 3000 2000 1000 State 2/28/2021 Interactive Query Processing 44
Outline § motivation and context § online reordering for user prioritization of partial results § architecture for adaptively generating partial results § policy for generating partial results èmore aggressive adaptation: state modules § wrapup 2/28/2021 Interactive Query Processing 45
Granularity of Query Operators § relational operators: logical abstractions • encapsulate multiple physical effects • inflexible in handling unexpected changes hash jn P R ind. jn Eddy S R S P § cannot gracefully adapt • • access method/data source selection join algorithm selection resource allocation: e. g. memory delays [Query Scrambling, XJoin] § want to encapsulate at level of physical operators 2/28/2021 Interactive Query Processing 46
State Modules Elevator Pitch hash jn Eddy P R ind. jn R P P § isolate state in State Modules + work sharing + routing flexibility + query execution = routing + adapt access methods, join algorithms gracefully + directly measure & adapt resource consumption ? ? 2/28/2021 Interactive Query Processing 47
Outline § motivation and context § online reordering for user prioritization of partial results § architecture for generating more aggressive partial results § policy for generating partial results § more aggressive adaptation: state modules èwrapup 2/28/2021 Interactive Query Processing 48
Related Work § information retrieval • ranked retrieval, relevance feedback • search strategies, Berry Picking § incremental query processing • pipelining hash joins • top N/fast first queries (Haas/Hellerstein`99, Ives et al. `99, Urhan/Franklin`00) (Carey/Kossman`97, Antoshenkov/Ziauddin`96) § adaptivity • parametric query plans (Graefe/Cole`94) • mid-query reoptimization (Urhan/Franklin/Amsaleg`98, • competition Kabra/De. Witt`98, Ives et al. `99) (Antoshenkov/Ziauddin`96) § miscellaneous • precomputed summaries (OLAP, materialized views, AQUA) • parachute queries (Bonnet/Tomasic‘ 98) • APPROXIMATE 2/28/2021 (Vrbsky/Liu`93) Interactive Query Processing 49
Summary § applications, query processors need tighter coupling § online reordering • effective way of supporting dynamic user control § partial results as desired • embed reordering within dynamically controlled dataflow § hard to map user-interaction needs into concrete algorithm performance goals • wanted: benchmarks based on user/application traces 2/28/2021 Interactive Query Processing 50
Future Work § session-granularity query processing • query evolution • lazy evaluation: user/client navigates through partial results § helps thin clients • closer dbms-application interaction § evaluation tools • benchmarks driven by traces/user studies § for more information: http: //telegraph. cs. berkeley. edu/ http: //control. cs. berkeley. edu/ 2/28/2021 Interactive Query Processing 51
Prioritizing particular columns (1) § how do we prioritize external sources? § increase number of threads used to probe it • resource scheduling rather than data scheduling § if eddy sends na tuples to A, nb to B, per second na <= threadsa/latency(A) nb <= threadsb/latency(B) threadsa + threadsb <= Number of Threads Per Query GOAL: maximize na 2/28/2021 QOF(A) + nb QOF(B) Interactive Query Processing 52
Prioritizing particular columns (2) Number of tuples read § with equal number of threads (3 each) Income 3000 2000 1000 Crime Ratings 0 100 200 300 time (s) 2/28/2021 Interactive Query Processing 53
Prioritizing particular columns (2) Number of tuples read § choosing number of threads according to priorities (5 and 1) Income 3000 2000 1000 Crime Ratings 0 100 200 300 time (s) 2/28/2021 Interactive Query Processing 54
Prioritizing particular columns (3) § more generally, optimize with multiple resource constraints • netizenship • $$’s • future work 2/28/2021 Interactive Query Processing 55
Telegraph UI § partial results displayed on screen as they arrive § “multi-resolution spreadsheet” • values clustered into groups • can roll-up or drill-down to see different granularities • different columns visible at different drilldown levels § client hash table mapping groups to values § navigation • vertical/horizontal scrolling • rollup/drilldown 2/28/2021 Interactive Query Processing 56
HCI Motivation: Why Interactivity? § Berry picking (Bates ‘ 90, ‘ 93) § user studies • decision support (O’day and Jeffries ‘ 93) • relevance feedback (Koenemann and Belkin ‘ 96) 2/28/2021 Interactive Query Processing 57
Disk data layout during juggling § performance goal: • favor early results: optimize Phase 1, at expense of Phase 2 • spool : sequential I/O, enrich: random I/O + reordering in Phase 2 much easier than in Phase 1 § enrich needs approx. index on side-disk • have hash-index on tuples, according to user interest • done at “group” granularity 2/28/2021 Interactive Query Processing 58
Other applications of reordering § can add reorder to any dataflow § will later discuss application for query systems that give aggressive partial results § also useful in batch query processing • • sorting often used in query plans for performance gains can be replaced by best-effort reordering little performance hit, but plan is now pipelining will not discuss further in this talk § spreadsheets 2/28/2021 Interactive Query Processing 59
Telegraph-Client API client sql query Telegraph partial results preferences § preferences: • col priorities • {<predicate, row priority, row-col priorities>… } § e. g. {<1, 1. 0, none>, <2, 1. 7, none>, <3, 1. 3, map=high>, <8/33014, 1. 0, none>, <8/60610, 1. 0, none>} 2/28/2021 Interactive Query Processing 60
Granularity of Reordering/Routing § reorder and route tuples at granularity of a group § group = <base relations, predicate> § groups created and deleted dynamically, as user navigates in UI § group predicate may be a subset of applicationspecified row predicate • final row depends on values this tuple joins with 2/28/2021 Interactive Query Processing 61
Interactive query processing with nonpipelining operators § basic techniques independent of pipelined operators. • reordering --- can be used in general query plans, although effectiveness may be hindered by blocking operators • State Modules applicable in general -- helps with adaptivity § much work on making plan operators pipelining -ripple join, xjoin, tuqwila • reordering itself can be used to avoid blocking sorts in some cases 2/28/2021 Interactive Query Processing 62
Query processing using Ste. Ms (1) § Ste. M operations -- works like index on a table • build(tuple): add tuple to Ste. M • probe(tuple): find all matches among build values § bounce back tuple if all matches not found and probe tuple not cached elsewhere § performing joins by routing through Ste. Ms • index joins § regular synchronous index joins easy -- have no state § over distributed sources, lookups cached by building into Ste. M separating cache allows Eddy to distinguish access cost R bl d S R R . ob pr S Eddy pending probe results Interactive Query Processing e be § Ste. M acts as rendezvous buffer for 2/28/2021 Ste. MS o pr § helps for high-throughput Internet sources R • asynchronous index joins [GW’ 00] Ste. MR d. l b • S 63
Query processing using Ste. Ms (2) § performing joins by routing through Ste. Ms (contd) bl S R Eddy R d. pr S R ob e be Ste. MS o pr . Ste. MR d bl • doubly pipelined hash join • use two Ste. Ms, one on each source S § can also do extensions (e. g. Tuqwila [IFF+99]), and even non-pipelined joins, using appropriate Ste. Ms § thus can simulate any static query plan by appropriate routing 2/28/2021 Interactive Query Processing 64
Query processing using Ste. Ms (3) § how to do this in general? • don’t want Eddy to be a super-join • want it to route as per user desires and source properties, independent of any join algorithm § routing constraint: • each tuple must have a non-zero probability of being sent to every appropriate module • can avoid repeated looping by marking tuples § Theorem: any routing that satisfies above constraint will produce all query results and terminate § could get duplicates • prune at application, or • enforce atomicity in tuple routing 2/28/2021 Interactive Query Processing 65
Benefits of Ste. Ms § adaptivity • join algorithm and access path selection are determined by eddy routing • hence Eddy can dynamically choose these based on source properties • in fact, Eddy can do hybrid join algorithms § adapt to dynamic delays, etc. § work sharing -- for query evolution, competition § better information to the Eddy • speeds, memory consumption 2/28/2021 Interactive Query Processing 66
Lazy Evaluation § User navigates thru partial results • can see only a few at a time § exploit to delay processing • do on demand on the region user currently navigating over § things to delay • expensive source lookups (e. g. map) • joins (e. g. …) • any other formatting etc. 2/28/2021 Interactive Query Processing 67
Giving probabilistic results § view partial result as somethng that throws light on outer-join space • positive result: probabilistic • negative result: deterministic § benefit of positive result: • weighted sum of cells it is likely to show; penalty for false result • repudiation of earlier false results § benefit of negative result: • direct benefit unlikely • repudiation of earlier false results § complications • many tuples may contribute to single output tuple -aggregation 2/28/2021 Interactive Query Processing 68
Probabilistic partial results § partial result => may not apply all filters § still want to show result probabilistically • filters ill-specified • data on Internet often inconsistent • show all partial results: “full disjunction” (Galindo-Legaria ‘ 94) 2/28/2021 Interactive Query Processing Outer Join Space Output Space § positive and negative results 69
What does user see? § user navigates thru partial results § with complete row partial results: • can either explore these results in detail (sort/scroll along columns as in spreadsheet) • compute approx. aggregates on them in online fashion § errors shrinking as more results come in § with probabilistic, incomplete results? • easy if primary key in partial result • what to show otherwise? 2/28/2021 Interactive Query Processing 70
Quantifying benefit of partial results § partial result sheds light on outer-join space • positive result: probabilistic • negative result: deterministic § benefit of positive result: • weighted sum of cells it is likely to show • penalty for false result ? • repudiation of earlier false results ? § benefit of negative result: • direct benefit unlikely • repudiation of earlier false results ? 2/28/2021 Interactive Query Processing 71
Why probabilistic results are good § data on Internet is often inconsistent • hence even exact processing cannot give perfect answers § people are used to sloppy answers § can allow negations in expert interfaces only 2/28/2021 Interactive Query Processing 72
Data growth vs. Computer Speedup § Moore’s Law -- # of transistors/chip doubles every 18 months (1965) § data growth Source: J. Porter, Disk/Trend, Inc. (http: //www. disktrend. com/pdf/portrpkg. pdf) 2/28/2021 Interactive Query Processing 73
Disk Appetite, contd. § Greg Papadopoulos, CTO Sun: • Disk sales doubling every 9 months § similar results from Winter VLDB survey § time to process all your data doubles every 18 months! 2/28/2021 Interactive Query Processing 74
- Vijayshankar raman
- Batch processing vs interactive processing
- Dns recursive iterative
- Query tree and query graph
- Query tree and query graph
- Steps in query processing
- Query optimization steps
- Steps in query processing
- Measures of query cost in dbms
- Objectives of query processing
- Steps of query processing
- Steps in query processing
- Algorithms for query processing and optimization
- Distributed query processing in dbms
- Characteristics of query processor
- Sketch techniques for approximate query processing
- What is the role of eddy in adaptive query processing
- Distributed query processing
- Distributed query processing
- Distributed query processing
- Which algorithm
- Sql server intelligent query processing
- Top down bottom up listening activities
- Advanced weather interactive processing system
- What is your favourite subject?
- Interactive science notebooks
- Interactive science notebook rubric
- Interactive science notebook design
- Cs 418 interactive computer graphics
- Glcreatebuffer
- What are the interactive input methods
- Cs 418 interactive computer graphics
- Enable nui in process gpu
- Interactive computer graphics examples
- Cs 418 interactive computer graphics
- Cs 418 interactive computer graphics
- Interactive computer graphics examples
- Water the elixir of life essay wikipedia
- Awgncb
- Aina za kipimio cha ramani
- Armida sodo
- Applications of raman effect in daily life
- Fizik raman
- Why are raman and ir complementary
- Difference between ir and raman spectroscopy
- Portable raman analyzer for hazmat and narcotics
- Low na
- Espectroscopia raman instrumentacion
- Difference between internet and www
- Raman vs rayleigh scattering
- Raman spectroscopy selection rules
- Props wikipedia
- Raman scattering definition
- Edge emitting laser
- Sanjay raman mit
- Inline raman spectroscopy
- Why are raman and ir complementary
- Raman 1
- Kameswari chebrolu
- Use of raman spectroscopy
- Raman font style
- Acharyamarch
- Dispersion raman
- Raman spectroscopy disadvantages
- Raman saçılması
- Non rigid rotator
- Raman
- Narayanan raman
- What is the imaginary elixir of life
- Jayashree raman
- Edfa gain spectrum
- Amplificador raman
- Short division vs long division
- Long division vs synthetic division
- Division key vocabulary