Interactive Query Processing Vijayshankar Raman Computer Science Division

Motivation: Nature of querying § querying = extracting information from data sets § different

Problems with traditional solutions § mismatch between system functionality and iterative mode of HCI

Interactive processing query process exact answer § HCI requirements • users must get continual

results result estimates Interactive processing § HCI requirements • users must get continual feedback

Background: Traditional Database Query Processing declarative query select R. a, S. b, T. c

My Research § interactive query processing - assume: dataflow thru pipelined operators T 1.

Talk Outline § motivation and context èsupport for dynamic user control in traditional query

Design goals in supporting user control § make minimal change to system architecture §

Online Reordering (Raman et al. ’ 99, ’ 00) § users perceive data being

Context: an application of reordering § online aggregation [Hellerstein/Haas/Wang ‘ 97, Haas/Hellerstein ‘ 99

Online Aggregation Screenshot SELECT AVG(gpa) FROM students GROUP BY college 2/28/2021 Interactive Query Processing

Framework for Online Reordering acddbadb. . . c b network xfer. f(t) a d

Juggle mechanism for reordering process/consume get. Next buffer prefetch produce enrich spool side disk

Reordering policies GOAL: “good” permutation of items t 1…tn to t 1…t n QOF

QOF in Online Aggregation § avg weighted confidence interval § preference acts as weight

Other QOF functions § rate of processing (for a group) preference • QOF =

Results: Reordering in Online Aggregation § implemented in Informix UDO server § experiments with

# tuples processed Performance results time § without reordering 2/28/2021 Interactive Query Processing 19

# tuples processed Performance results time § 3 times faster for interesting groups §

confidence interval Performance results E C A time 2/28/2021 Interactive Query Processing 21

Overall Findings § higher processing costs • index/hash join, subquery, … • reordering easy

Outline § motivation and context § support for dynamic user control in traditional query

Incremental results Result Space § traditional arch. also generate continual result tuples • arises

Context: Query processing in Telegraph § Telegraph: adaptive dataflow system to query diverse, distributed

Partial results § complete result tuples too rigid • source latencies high, diverse •

Partial results § complete tuples too rigid • source latencies high, diverse • dynamic

Correctness of partial results § some columns essential for UI • e. g. group-by

Dynamic query plans T R T S R Eddy S S T § Eddy

Partial Results in Dynamic Plans. . . modules Eddy R § S P .

Prioritizing tuples. . . modules copy R S P. . . inputs § enhance

Routing and reordering policy GOAL: QOF time § GOAL: at any time, route to

Outline § motivation and context § online reordering for user prioritization of partial results

Telegraph UI § screenshot 2/28/2021 Interactive Query Processing 34

Getting user preferences § infer from navigation • row/column scrolling, group drill down and

QOF: Benefit of a partial result § depends on user preferences § benefit of

QOF: Benefit of Routing a Tuple § route tuple according to expected benefit and

Throughput difference Bush Contributors Income (index) AFB Crime Ratings (index) Number of tuples read

Benefit of giving partial results Bush Contributors Income (index) AFB Crime Ratings (index) 40000

Effect of Delays Bush Contributors Income (index) AFB Crime Ratings (index) with delay in

Prioritizing particular values 2/28/2021 scroll 600 scroll # Aggr. Updates § SELECT AVG(Income) FROM

# Bush Contributors Distribution of contributions 5000 4000 3000 2000 1000 State 2/28/2021 Interactive

Granularity of Query Operators § relational operators: logical abstractions • encapsulate multiple physical effects

State Modules Elevator Pitch hash jn Eddy P R ind. jn R P P

Related Work § information retrieval • ranked retrieval, relevance feedback • search strategies, Berry

Summary § applications, query processors need tighter coupling § online reordering • effective way

Future Work § session-granularity query processing • query evolution • lazy evaluation: user/client navigates

Prioritizing particular columns (1) § how do we prioritize external sources? § increase number

Prioritizing particular columns (2) Number of tuples read § with equal number of threads

Prioritizing particular columns (2) Number of tuples read § choosing number of threads according

Prioritizing particular columns (3) § more generally, optimize with multiple resource constraints • netizenship

Telegraph UI § partial results displayed on screen as they arrive § “multi-resolution spreadsheet”

HCI Motivation: Why Interactivity? § Berry picking (Bates ‘ 90, ‘ 93) § user

Disk data layout during juggling § performance goal: • favor early results: optimize Phase

Other applications of reordering § can add reorder to any dataflow § will later

Telegraph-Client API client sql query Telegraph partial results preferences § preferences: • col priorities

Granularity of Reordering/Routing § reorder and route tuples at granularity of a group §

Interactive query processing with nonpipelining operators § basic techniques independent of pipelined operators. •

Query processing using Ste. Ms (1) § Ste. M operations -- works like index

Query processing using Ste. Ms (2) § performing joins by routing through Ste. Ms

Query processing using Ste. Ms (3) § how to do this in general? •

Benefits of Ste. Ms § adaptivity • join algorithm and access path selection are

Lazy Evaluation § User navigates thru partial results • can see only a few

Giving probabilistic results § view partial result as somethng that throws light on outer-join

Probabilistic partial results § partial result => may not apply all filters § still

What does user see? § user navigates thru partial results § with complete row

Quantifying benefit of partial results § partial result sheds light on outer-join space •

Why probabilistic results are good § data on Internet is often inconsistent • hence

Data growth vs. Computer Speedup § Moore’s Law -- # of transistors/chip doubles every

Disk Appetite, contd. § Greg Papadopoulos, CTO Sun: • Disk sales doubling every 9

Slides: 74

Download presentation

Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

Motivation: Nature of querying § querying = extracting information from data sets § different techniques in different settings § intrinsically slow § significant user-system interaction § info. seekers work iteratively, gradually refining requests based on feedback [O’day and Jeffries ‘ 93, Koenemann and Belkin ‘ 96] 2/28/2021 Interactive Query Processing 2

Problems with traditional solutions § mismatch between system functionality and iterative mode of HCI § “black box” functionality • batch processing • frustrating delays in iterative process query process exact answer Query: find average grade of students in each college 2/28/2021 Interactive Query Processing 3

Interactive processing query process exact answer § HCI requirements • users must get continual feedback on results of processing • allow users to control processing based on prior feedback 2/28/2021 Interactive Query Processing 4

results result estimates Interactive processing § HCI requirements • users must get continual feedback on results of processing • allow users to control processing based on prior feedback § performance goals • not to minimize time to give complete results • give continually improving partial results • adapt to dynamically specified performance goals 2/28/2021 Interactive Query Processing 5

Background: Traditional Database Query Processing declarative query select R. a, S. b, T. c from R, S, T where <conditions> query optimizer R § data access: scan-based ( R T query executor S query plan ) or index-based ( T ) § selection ( ) -- filter tuples based on condition § join ( ) -- apply filter on cross-product of the inputs • can use index if available (“index-join”) • else dynamically build hash-tables on inputs (“hash-join”) § query optimizer • chooses plan: operator implementations and ordering • cost model based on pre-computed summary statistics 2/28/2021 Interactive Query Processing 6

My Research § interactive query processing - assume: dataflow thru pipelined operators T 1. support for dynamic user control in traditional query proc. architectures S R 2. adaptively giving partial results in response to user control S R T 3. still more aggressive adaptation - state modules § interactive data cleaning 2/28/2021 T (Potter’s Wheel) Interactive Query Processing 7

Talk Outline § motivation and context èsupport for dynamic user control in traditional query proc. architectures § architecture for adaptively generating partial results § policy for generating partial results • user interface for displaying partial results • impact on routing § wrapup 2/28/2021 Interactive Query Processing 8

Design goals in supporting user control § make minimal change to system architecture § must be independent of particular query processing algorithms § no delay in processing 2/28/2021 Interactive Query Processing 9

Online Reordering (Raman et al. ’ 99, ’ 00) § users perceive data being processed over time • prioritize processing for “interesting” tuples • interest based on user-specified preferences § reorder dataflow so that interesting tuples go first § encapsulate reordering as pipelined dataflow operator T T R 2/28/2021 S S R Interactive Query Processing 10

Context: an application of reordering § online aggregation [Hellerstein/Haas/Wang ‘ 97, Haas/Hellerstein ‘ 99 ] • for SQL aggregate queries, give gradually improving estimates • with confidence intervals SELECT AVG(gpa) FROM students GROUP BY • allow userscollege to speed up estimate refinement for groups of interest • prioritize for processing at a per-group granularity 2/28/2021 Interactive Query Processing 11

Online Aggregation Screenshot SELECT AVG(gpa) FROM students GROUP BY college 2/28/2021 Interactive Query Processing 12

Framework for Online Reordering acddbadb. . . c b network xfer. f(t) a d c ab produce process reorder consume user interest § want no delay in processing Þ in general, reordering can only be best-effort § typically process/consume slower than produce • exploit throughput difference to reorder § two aspects • mechanism for best-effort reordering • reordering policy 2/28/2021 Interactive Query Processing 13

Juggle mechanism for reordering process/consume get. Next buffer prefetch produce enrich spool side disk § two threads -- prefetch from input -- spool/enrich from auxiliary side disk § juggle data between buffer and side disk • keep buffer full of “interesting” items • get. Next chooses best item currently on buffer § get. Next, enrich/spool decisions -- based on reordering policy § side disk management • hash index, populated in a way that postpones random I/O 2/28/2021 Interactive Query Processing 14

Reordering policies GOAL: “good” permutation of items t 1…tn to t 1…t n QOF time § quality of feedback for a prefix t 1 t 2…t k QOF(UP(t 1), UP(t 2), … UP(t k )), UP = user preference • determined by application § goodness of reordering: d. QOF/dt § implication for juggle mechanism • process gets item from buffer that increases QOF the most • juggle tries to maintain buffer with such items 2/28/2021 Interactive Query Processing 15

QOF in Online Aggregation § avg weighted confidence interval § preference acts as weight on confidence interval • QOF = UPi / ni , ni = number of tuples processed from group i Þ process pulls items from group with max UPi / ni ni Þ desired ratio of group i tuples on buffer = UPi • juggle tries to maintain this by enrich/spool 2/28/2021 Interactive Query Processing 2/3 / UPj 2/3 16

Other QOF functions § rate of processing (for a group) preference • QOF = (ni - n. UPi)2 (variance from ideal proportions) Þ process pulls items from group with max (n. UPi - ni ) Þ desired ratio of group i tuples in buffer = UPi 2/28/2021 Interactive Query Processing 17

Results: Reordering in Online Aggregation § implemented in Informix UDO server § experiments with modified TPC-D queries § questions: • how much throughput difference is needed for reordering • can we reorder handle skewed data consume SELECT AVG(o_totalprice), o_orderpriority FROM order WHERE exists ( juggle SELECT * FROM lineitem WHERE l_orderkey = o_orderkey) scan GROUP BY o_orderpriority • index-only join • 5 orderpriorities, zipf distribution 2/28/2021 Interactive Query Processing index § one stress test: skew, very small proc. cost process 18

# tuples processed Performance results time § without reordering 2/28/2021 Interactive Query Processing 19

# tuples processed Performance results time § 3 times faster for interesting groups § overhead: 2% completion time, 1 extra disk 2/28/2021 Interactive Query Processing 20

confidence interval Performance results E C A time 2/28/2021 Interactive Query Processing 21

Overall Findings § higher processing costs • index/hash join, subquery, … • reordering easy § very low processing costs • juggle constrained by density of interesting tuples • outlier groups hard to speed up • better to use index stride [Hellerstein/Haas/Wang ‘ 97] § needs pre-computed index § reordering becomes easier over time § question to answer: • where to place juggle? 2/28/2021 Interactive Query Processing 22

Outline § motivation and context § support for dynamic user control in traditional query proc. architectures èarchitecture for adaptively generating partial results § policy for generating partial results • user interface for displaying partial results • impact on routing § wrapup 2/28/2021 Interactive Query Processing 23

Incremental results Result Space § traditional arch. also generate continual result tuples • arises from continual dataflow thru pipelining operators • much work on pipelining joins [Wilschut/Apers’ 91, Haas/Hellerstein’ 99, Ives et al. ’ 99, Urhan/Franklin’ 00] § this is too rigid • especially in distributed envirorments 2/28/2021 Interactive Query Processing 24

Context: Query processing in Telegraph § Telegraph: adaptive dataflow system to query diverse, distributed sources § much data available as services over Internet • currently only accessible by browse/search/forms § want to combine this data through queries § examples: • campaign finance information (Election 2000) Federal Election Commision Yahoo Home Prices Census APBnews Crime Ratings Maps IMDB • restaurant information Switch. Board Map. Quest 2/28/2021 Fodors S. F. Chronicle Health inspection reports Interactive Query Processing 25

Partial results § complete result tuples too rigid • source latencies high, diverse • dynamic source variations, delays • query does not capture user desires § non-expert and even expert users: query too broad 2/28/2021 Interactive Query Processing 26

Partial results § complete tuples too rigid • source latencies high, diverse • dynamic source variations, delays • query does not capture user desires § non-expert and even expert users: query too broad § want to process queries flexibly • give partial result tuples asap • adapt dynamically to user preferences and source variations 2/28/2021 Interactive Query Processing 27

Correctness of partial results § some columns essential for UI • e. g. group-by columns / sort-by columns § for maximum flexibility -- outer-join semantics • good idea for Web sources § or -- strict join semantics • no partial results without ensuring match exists • key constraints helpful § aggregates: • update early, and compensate later • statistical guarantees for aggregates difficult § fanouts unknown (can be 0!) § key constraints helpful 2/28/2021 Interactive Query Processing 28

Dynamic query plans T R T S R Eddy S S T § Eddy [Avnur and Hellerstein 2000] • router for directing data thru modules • minimize completion time § adaptively choosing join order for arbitrary tuple + all partial tuples generable 2/28/2021 Interactive Query Processing 29

Partial Results in Dynamic Plans. . . modules Eddy R § S P . . . inputs my focus: continual partial results • dynamically adapt dataflow to suit user preferences § Eddy must decide a) what tuple to route next b) where to route it • based on user preferences and module properties § need routing policy and a reordering mechanism • eddy memory buffer and module queues bounded 2/28/2021 Interactive Query Processing 30

Prioritizing tuples. . . modules copy R S P. . . inputs § enhance Eddy with Juggle • reorder exploiting slowness of modules + synergy -- juggle location problem solved! • juggle everywhere, to the extent of throughput difference 2/28/2021 Interactive Query Processing 31

Routing and reordering policy GOAL: QOF time § GOAL: at any time, route to max. d. QOF/dt = benefit of sending tuple to module / cost § cost: estimate data rates to/from module § benefit: dependent on application and UI • how partial results impact the UI • user preferences 2/28/2021 Interactive Query Processing 32

Outline § motivation and context § online reordering for user prioritization of partial results § architecture for adaptively generating partial results èpolicy for generating partial results • Telegraph UI: displaying results and inferring preferences • experimental results § wrapup 2/28/2021 Interactive Query Processing 33

Telegraph UI § screenshot 2/28/2021 Interactive Query Processing 34

Getting user preferences § infer from navigation • row/column scrolling, group drill down and rollup § prioritize visible rows/columns • “query evolution” § subset “one-size-fits-all” queries § future work: query expansion § explicit • up/down buttons on columns • at the cell level -- need for some expensive sources § map to QOF metric on partial results 2/28/2021 Interactive Query Processing 35

QOF: Benefit of a partial result § depends on user preferences § benefit of updating a cell in output • row priority x column weight x cell resolution • incremental cell resolution: how much does one extra update add to the cell’s value § scalars -- 1 § aggregations -- change in confidence interval § informing user about execution progress • convey cell resolution on UI • future work 2/28/2021 Interactive Query Processing 36

QOF: Benefit of Routing a Tuple § route tuple according to expected benefit and cost t M T § benefit of sending a tuple t to a module M and forming set T = cells c T benefit of updating c 2/28/2021 Interactive Query Processing 37

Throughput difference Bush Contributors Income (index) AFB Crime Ratings (index) Number of tuples read 40000 Bush Contributors 30000 20000 10000 0 Income Crime Ratings 200 400 600 time (s) 2/28/2021 Interactive Query Processing 38

Benefit of giving partial results Bush Contributors Income (index) AFB Crime Ratings (index) 40000 ts l u es 30000 r l tia 20000 10000 Bush Contributors m u n r e b of #partial results r a p #complete results Income Crime Ratings 0 200 400 600 time (s) 2/28/2021 Interactive Query Processing 39

Effect of Delays Bush Contributors Income (index) AFB Crime Ratings (index) with delay in AFB Crime Ratings ts sul e r l a i part f o 10000 number 1000 100 lts u s e r delay ete l p m co f o r be num 10 0 200 400 600 time(s) 2/28/2021 Interactive Query Processing 40

Prioritizing particular values 2/28/2021 scroll 600 scroll # Aggr. Updates § SELECT AVG(Income) FROM Bush Donors, Census GROUP BY Bush. Donors. State § scrolling: {AZ, AR, CA, CO, CT}, {LA, KY, MA, MD, MI}, {TX, UT, VA, VI, VT} 400 200 0 100 200 300 time (s) Interactive Query Processing 400 41

# Bush Contributors Distribution of contributions 5000 4000 3000 2000 1000 State 2/28/2021 Interactive Query Processing 44

Outline § motivation and context § online reordering for user prioritization of partial results § architecture for adaptively generating partial results § policy for generating partial results èmore aggressive adaptation: state modules § wrapup 2/28/2021 Interactive Query Processing 45

Granularity of Query Operators § relational operators: logical abstractions • encapsulate multiple physical effects • inflexible in handling unexpected changes hash jn P R ind. jn Eddy S R S P § cannot gracefully adapt • • access method/data source selection join algorithm selection resource allocation: e. g. memory delays [Query Scrambling, XJoin] § want to encapsulate at level of physical operators 2/28/2021 Interactive Query Processing 46

State Modules Elevator Pitch hash jn Eddy P R ind. jn R P P § isolate state in State Modules + work sharing + routing flexibility + query execution = routing + adapt access methods, join algorithms gracefully + directly measure & adapt resource consumption ? ? 2/28/2021 Interactive Query Processing 47

Outline § motivation and context § online reordering for user prioritization of partial results § architecture for generating more aggressive partial results § policy for generating partial results § more aggressive adaptation: state modules èwrapup 2/28/2021 Interactive Query Processing 48

Related Work § information retrieval • ranked retrieval, relevance feedback • search strategies, Berry Picking § incremental query processing • pipelining hash joins • top N/fast first queries (Haas/Hellerstein`99, Ives et al. `99, Urhan/Franklin`00) (Carey/Kossman`97, Antoshenkov/Ziauddin`96) § adaptivity • parametric query plans (Graefe/Cole`94) • mid-query reoptimization (Urhan/Franklin/Amsaleg`98, • competition Kabra/De. Witt`98, Ives et al. `99) (Antoshenkov/Ziauddin`96) § miscellaneous • precomputed summaries (OLAP, materialized views, AQUA) • parachute queries (Bonnet/Tomasic‘ 98) • APPROXIMATE 2/28/2021 (Vrbsky/Liu`93) Interactive Query Processing 49

Summary § applications, query processors need tighter coupling § online reordering • effective way of supporting dynamic user control § partial results as desired • embed reordering within dynamically controlled dataflow § hard to map user-interaction needs into concrete algorithm performance goals • wanted: benchmarks based on user/application traces 2/28/2021 Interactive Query Processing 50

Future Work § session-granularity query processing • query evolution • lazy evaluation: user/client navigates through partial results § helps thin clients • closer dbms-application interaction § evaluation tools • benchmarks driven by traces/user studies § for more information: http: //telegraph. cs. berkeley. edu/ http: //control. cs. berkeley. edu/ 2/28/2021 Interactive Query Processing 51

Prioritizing particular columns (1) § how do we prioritize external sources? § increase number of threads used to probe it • resource scheduling rather than data scheduling § if eddy sends na tuples to A, nb to B, per second na <= threadsa/latency(A) nb <= threadsb/latency(B) threadsa + threadsb <= Number of Threads Per Query GOAL: maximize na 2/28/2021 QOF(A) + nb QOF(B) Interactive Query Processing 52

Prioritizing particular columns (2) Number of tuples read § with equal number of threads (3 each) Income 3000 2000 1000 Crime Ratings 0 100 200 300 time (s) 2/28/2021 Interactive Query Processing 53

Prioritizing particular columns (2) Number of tuples read § choosing number of threads according to priorities (5 and 1) Income 3000 2000 1000 Crime Ratings 0 100 200 300 time (s) 2/28/2021 Interactive Query Processing 54

Prioritizing particular columns (3) § more generally, optimize with multiple resource constraints • netizenship • $$’s • future work 2/28/2021 Interactive Query Processing 55

Telegraph UI § partial results displayed on screen as they arrive § “multi-resolution spreadsheet” • values clustered into groups • can roll-up or drill-down to see different granularities • different columns visible at different drilldown levels § client hash table mapping groups to values § navigation • vertical/horizontal scrolling • rollup/drilldown 2/28/2021 Interactive Query Processing 56

HCI Motivation: Why Interactivity? § Berry picking (Bates ‘ 90, ‘ 93) § user studies • decision support (O’day and Jeffries ‘ 93) • relevance feedback (Koenemann and Belkin ‘ 96) 2/28/2021 Interactive Query Processing 57

Disk data layout during juggling § performance goal: • favor early results: optimize Phase 1, at expense of Phase 2 • spool : sequential I/O, enrich: random I/O + reordering in Phase 2 much easier than in Phase 1 § enrich needs approx. index on side-disk • have hash-index on tuples, according to user interest • done at “group” granularity 2/28/2021 Interactive Query Processing 58

Other applications of reordering § can add reorder to any dataflow § will later discuss application for query systems that give aggressive partial results § also useful in batch query processing • • sorting often used in query plans for performance gains can be replaced by best-effort reordering little performance hit, but plan is now pipelining will not discuss further in this talk § spreadsheets 2/28/2021 Interactive Query Processing 59

Telegraph-Client API client sql query Telegraph partial results preferences § preferences: • col priorities • {<predicate, row priority, row-col priorities>… } § e. g. {<1, 1. 0, none>, <2, 1. 7, none>, <3, 1. 3, map=high>, <8/33014, 1. 0, none>, <8/60610, 1. 0, none>} 2/28/2021 Interactive Query Processing 60

Granularity of Reordering/Routing § reorder and route tuples at granularity of a group § group = <base relations, predicate> § groups created and deleted dynamically, as user navigates in UI § group predicate may be a subset of applicationspecified row predicate • final row depends on values this tuple joins with 2/28/2021 Interactive Query Processing 61

Interactive query processing with nonpipelining operators § basic techniques independent of pipelined operators. • reordering --- can be used in general query plans, although effectiveness may be hindered by blocking operators • State Modules applicable in general -- helps with adaptivity § much work on making plan operators pipelining -ripple join, xjoin, tuqwila • reordering itself can be used to avoid blocking sorts in some cases 2/28/2021 Interactive Query Processing 62

Query processing using Ste. Ms (1) § Ste. M operations -- works like index on a table • build(tuple): add tuple to Ste. M • probe(tuple): find all matches among build values § bounce back tuple if all matches not found and probe tuple not cached elsewhere § performing joins by routing through Ste. Ms • index joins § regular synchronous index joins easy -- have no state § over distributed sources, lookups cached by building into Ste. M separating cache allows Eddy to distinguish access cost R bl d S R R . ob pr S Eddy pending probe results Interactive Query Processing e be § Ste. M acts as rendezvous buffer for 2/28/2021 Ste. MS o pr § helps for high-throughput Internet sources R • asynchronous index joins [GW’ 00] Ste. MR d. l b • S 63

Query processing using Ste. Ms (2) § performing joins by routing through Ste. Ms (contd) bl S R Eddy R d. pr S R ob e be Ste. MS o pr . Ste. MR d bl • doubly pipelined hash join • use two Ste. Ms, one on each source S § can also do extensions (e. g. Tuqwila [IFF+99]), and even non-pipelined joins, using appropriate Ste. Ms § thus can simulate any static query plan by appropriate routing 2/28/2021 Interactive Query Processing 64

Query processing using Ste. Ms (3) § how to do this in general? • don’t want Eddy to be a super-join • want it to route as per user desires and source properties, independent of any join algorithm § routing constraint: • each tuple must have a non-zero probability of being sent to every appropriate module • can avoid repeated looping by marking tuples § Theorem: any routing that satisfies above constraint will produce all query results and terminate § could get duplicates • prune at application, or • enforce atomicity in tuple routing 2/28/2021 Interactive Query Processing 65

Benefits of Ste. Ms § adaptivity • join algorithm and access path selection are determined by eddy routing • hence Eddy can dynamically choose these based on source properties • in fact, Eddy can do hybrid join algorithms § adapt to dynamic delays, etc. § work sharing -- for query evolution, competition § better information to the Eddy • speeds, memory consumption 2/28/2021 Interactive Query Processing 66

Lazy Evaluation § User navigates thru partial results • can see only a few at a time § exploit to delay processing • do on demand on the region user currently navigating over § things to delay • expensive source lookups (e. g. map) • joins (e. g. …) • any other formatting etc. 2/28/2021 Interactive Query Processing 67

Giving probabilistic results § view partial result as somethng that throws light on outer-join space • positive result: probabilistic • negative result: deterministic § benefit of positive result: • weighted sum of cells it is likely to show; penalty for false result • repudiation of earlier false results § benefit of negative result: • direct benefit unlikely • repudiation of earlier false results § complications • many tuples may contribute to single output tuple -aggregation 2/28/2021 Interactive Query Processing 68

Probabilistic partial results § partial result => may not apply all filters § still want to show result probabilistically • filters ill-specified • data on Internet often inconsistent • show all partial results: “full disjunction” (Galindo-Legaria ‘ 94) 2/28/2021 Interactive Query Processing Outer Join Space Output Space § positive and negative results 69

What does user see? § user navigates thru partial results § with complete row partial results: • can either explore these results in detail (sort/scroll along columns as in spreadsheet) • compute approx. aggregates on them in online fashion § errors shrinking as more results come in § with probabilistic, incomplete results? • easy if primary key in partial result • what to show otherwise? 2/28/2021 Interactive Query Processing 70

Quantifying benefit of partial results § partial result sheds light on outer-join space • positive result: probabilistic • negative result: deterministic § benefit of positive result: • weighted sum of cells it is likely to show • penalty for false result ? • repudiation of earlier false results ? § benefit of negative result: • direct benefit unlikely • repudiation of earlier false results ? 2/28/2021 Interactive Query Processing 71

Why probabilistic results are good § data on Internet is often inconsistent • hence even exact processing cannot give perfect answers § people are used to sloppy answers § can allow negations in expert interfaces only 2/28/2021 Interactive Query Processing 72

Data growth vs. Computer Speedup § Moore’s Law -- # of transistors/chip doubles every 18 months (1965) § data growth Source: J. Porter, Disk/Trend, Inc. (http: //www. disktrend. com/pdf/portrpkg. pdf) 2/28/2021 Interactive Query Processing 73

Disk Appetite, contd. § Greg Papadopoulos, CTO Sun: • Disk sales doubling every 9 months § similar results from Winter VLDB survey § time to process all your data doubles every 18 months! 2/28/2021 Interactive Query Processing 74