Exploiting Asynchronous IO using the Asynchronous Iterator Model

  • Slides: 33
Download presentation
Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh

Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations: * Microsoft Hyderabad, # Guruji. com, & SAP

Agenda § AIO Background § Exploiting AIO in query processing • Asynchronous Iterator model

Agenda § AIO Background § Exploiting AIO in query processing • Asynchronous Iterator model § Asynchronous Index Nested Loops Join § Asynchronous versions of other operators § Performance results § Related Work § Conclusion COMAD 2008 , IIT Bombay

IO Processing : Traditional way Application Kernel System call Read () Context switch Application

IO Processing : Traditional way Application Kernel System call Read () Context switch Application Blocked ! data Initiate IO Read response § CPU is idle most of the time waiting for an IO completion COMAD 2008 , IIT Bombay

IO Processing : Async. way Application Kernel System call Initiate IO AIO Read ()

IO Processing : Async. way Application Kernel System call Initiate IO AIO Read () Notify Do other Read response data work !! COMAD 2008 , IIT Bombay

IO Processing : Async. Way § Asynchronous approach • Overlap of CPU and IO

IO Processing : Async. Way § Asynchronous approach • Overlap of CPU and IO processing • Application can generate multiple IO requests Allows IO subsystem to reorder access to data on disk • Important in RAID environments COMAD 2008 , IIT Bombay

Asynchronous IO Interface Linux 2. 6 kernel ( File descriptor, offset, buffer, num. Bytes,

Asynchronous IO Interface Linux 2. 6 kernel ( File descriptor, offset, buffer, num. Bytes, … ) aio_read ( aio structure) Request an AIO read operation aio_error ( aio structure ) Check the status of an AIO request lio_listio ( array of aio structures ) Initiate a list of AIO operations • We use list AIO in our implementation • Can initiate multiple IO read operations in one system call COMAD 2008 , IIT Bombay

Handling AIO completion § Signal-based handler • A signal is generated on IO completion

Handling AIO completion § Signal-based handler • A signal is generated on IO completion § Callback using interrupts Call completion handler • An interrupt is generated on IO completion § Concurrent access to completion handler and shared data structures in both of above methods § Polling • Store IO requests in pending queue and poll periodically for completion • Our experiments show polling beats signal/interrupt based approach COMAD 2008 , IIT Bombay

Demand-Driven Iterator • Open() • Next() NL J scan • Close() scan Blocking call

Demand-Driven Iterator • Open() • Next() NL J scan • Close() scan Blocking call ! Table A Table B § Bottom level nodes perform operations such as sequential scans or index scans. § Upper level nodes are join nodes or other operator nodes such as sort or aggregate. COMAD 2008 , IIT Bombay

Agenda § AIO Background § Exploiting AIO in query processing • Asynchronous Iterator model

Agenda § AIO Background § Exploiting AIO in query processing • Asynchronous Iterator model § Asynchronous Index Nested Loop (INL) Joins § Asynchronous versions of other operators § Performance results § Related Work § Conclusion COMAD 2008 , IIT Bombay

Asynchronous Iterator • Open() • Next() NL J scan • Close() scan • I

Asynchronous Iterator • Open() • Next() NL J scan • Close() scan • I don’t have the tuple available in the memory !! Table A Table B • Issue AIO read operation • Return “LATER” Non- Blocking call ! COMAD 2008 , IIT Bombay

Asynchronous Iterator Model (AIM) § Allow a node to return a status “LATER” to

Asynchronous Iterator Model (AIM) § Allow a node to return a status “LATER” to the parent • Instead of blocking for IO completion. § The parent operator could • Perform other work, such as fetching data from another input • Simply return a LATER status to its parent node • Or just loop, reinvoking the child operator till it returns a tuple E. g. root of the execution plan tree § Exact action depends on operator • Asynchronous versions of different operators • Focus on Asynchronous Indexed Nested Loops join COMAD 2008 , IIT Bombay

Asynchronous INL Joins § Original state of Indexed Nested Loops (INL) node • Left

Asynchronous INL Joins § Original state of Indexed Nested Loops (INL) node • Left and right subplans and qualifier lists § Augmented state for async INL node • An array of outer tuples each having a queue of matching inner TIDs AIO may have been issued for some already, others later • A workqueue for outer slots which already have AIO issued for their matching inner TIDS • An IO queue recording all pending AIO requests made by the node Used to poll for completion of AIO requests COMAD 2008 , IIT Bombay

Asynchronous INL Join (contd. ) § We divide the async INL join operations into

Asynchronous INL Join (contd. ) § We divide the async INL join operations into two stages • Stage 1: Fetch outer tuples and issues AIO requests • Stage 2: Check for AIO completion, process AIO results and return join results. § Stages are interleaved • Stage 1 may be in progress for some tuples, and Stage 2 for others COMAD 2008 , IIT Bombay

Asynchronous INL Join (contd. ) Stage 1 Fetch outer tuples For each outer tuple

Asynchronous INL Join (contd. ) Stage 1 Fetch outer tuples For each outer tuple Find the matching inner TIDs for each outer tuple Put the outer tuple in workqueue Issue LIST AIO for matching inner TIDS of all outer tuples in workqueue (subject to BATCH_SIZE) COMAD 2008 , IIT Bombay

Asynchronous INL Join (contd. ) § Rules • Batch size BATCH_SIZE: max number of

Asynchronous INL Join (contd. ) § Rules • Batch size BATCH_SIZE: max number of outstanding AIO requests Why? OS limits, efficiency issues We set the MAX_BATCH_SIZE per node to 200 in our experiments Scale BATCH_SIZE in powers of 2 till MAX_BATCH_SIZE so that async INL can output tuples quickly at the onset • Case where outer tuple matches a large number of inner tuples is handled appropriately • Keeping the AIO queue filled We issue further AIO requests (fetching outer tuples as required) if 10 % of earlier AIO requests have completed COMAD 2008 , IIT Bombay

Asynchronous INL Join (contd. ) For each outer tuple in workqueue Stage 2 Check

Asynchronous INL Join (contd. ) For each outer tuple in workqueue Stage 2 Check if any matching inner TIDs are present in memory Present ? No Yes • Remove that inner TID from outer tuple’s TID array • Perform join and add to result • if join result found break from loop Update workqueue Next page. . COMAD 2008 , IIT Bombay

Asynchronous INL Join (contd. ) Prev page. . Any join results? Yes Return result

Asynchronous INL Join (contd. ) Prev page. . Any join results? Yes Return result to parent node Back to start of Stage 2 Yes No Is no outstanding outer tuples & reached end of outer tuple Yes Poll for AIO completion Is tuple found or parent node cannot handle LATER No tup. Stat = END_OF_RESULT result = NULL No tupstat = LATER result = NULL Return result and tup. Stat to parent node COMAD 2008 , IIT Bombay

Async. versions of other operators § Async Sequential scan • Check if next tuple

Async. versions of other operators § Async Sequential scan • Check if next tuple is in the in-memory buffer • If its present, return the tuple • Else initiate an async read. Set tup. Stat = LATER and return § Out of order sequential scan • Start returning the tuples of a particular relation which are already there in the memory even if out of order • Concurrently, issue AIO for other tuples COMAD 2008 , IIT Bombay

Async. versions of other operators R E T LA I can start the sorting

Async. versions of other operators R E T LA I can start the sorting of other input ! Merge Join sort Seq scan T 1 T 2 LAT ER sort Initiate AIO read COMAD 2008 , IIT Bombay

Performance Results § Experiments with TPC-H database with scale factors of 1 and 10

Performance Results § Experiments with TPC-H database with scale factors of 1 and 10 in three different setups • Core 2 duo P 4 with: 1 GB RAM and TPC-H - 1 GB database (single disk) 1 GB RAM and TPC-H – 10 GB database (single disk) 3. 2 GB RAM and TPC-H – 10 GB database (4 disks / RAID 10) § We use Postgre. SQL 8. 1. 3 as the code base § Compare it with our modified version of the same code base, incorporating asynchronous iterator model • with async INL and async seq. scan COMAD 2008 , IIT Bombay

Performance Results: 1 GB RAM Query 1 a: select l_orderkey, l_quantity from orders, lineitem

Performance Results: 1 GB RAM Query 1 a: select l_orderkey, l_quantity from orders, lineitem where o_orderkey=l_orderkey and l_orderkey%100=2 and l_linestatus=’F’ TPCH 1 GB TPCH 10 GB COMAD 2008 , IIT Bombay

Performance Results: 1 GB RAM Query 2 a: select l_orderkey, l_quantity from orders, lineitem,

Performance Results: 1 GB RAM Query 2 a: select l_orderkey, l_quantity from orders, lineitem, customer where o_orderkey=l_orderkey and o_custkey=c_custkey and l_orderkey%100=2 and l_linestatus=’F’ TPCH 1 GB TPCH 10 GB COMAD 2008 , IIT Bombay

Performance Results : 1 GB RAM Query 2 a : Join of orders, lineitem

Performance Results : 1 GB RAM Query 2 a : Join of orders, lineitem and customer with filter (TPCH 1 GB ) Startup effect COMAD 2008 , IIT Bombay

Performance Results: 1 GB RAM Query 2 b: select l_orderkey, l_quantity from myorders, lineitem,

Performance Results: 1 GB RAM Query 2 b: select l_orderkey, l_quantity from myorders, lineitem, customer where o_orderkey=l_orderkey and o_custkey=c_custkey -- No tight selection TPCH 1 GB TPCH 10 GB 1 GB RAM COMAD 2008 , IIT Bombay

Performance Results: 3. 2 GB + RAID Query 2 a 1 a : Join

Performance Results: 3. 2 GB + RAID Query 2 a 1 a : Join of orders, orders and lineitem andwith customer filter with filter TPC-H 10 GB / 3. 2 GB RAM / 4 disks RAID 10 COMAD 2008 , IIT Bombay

Performance Results: 3. 2 GB + RAID Query 2 b 1 b : Join

Performance Results: 3. 2 GB + RAID Query 2 b 1 b : Join of myorders, lineitem and customer TPC-H 10 GB / 3. 2 GB RAM / 4 disks RAID 10 COMAD 2008 , IIT Bombay

Performance Results TPC-H Q 12: select l_shipmode, sum(. . . ) from orders, lineitem

Performance Results TPC-H Q 12: select l_shipmode, sum(. . . ) from orders, lineitem where o_orderkey = l_orderkey and <several selection> group by l_shipmode order by l_shipmode Original INL Async INL Gain TPCH 1 GB RAM 64. 7 sec 48 sec 25 % TPCH 10 GB 1 GB RAM 687 sec 431 sec 37 % TPCD 10 GB RAID 10 4 disks, 3. 2 GB RAM 164 sec 147 sec 10 % COMAD 2008 , IIT Bombay

Related Work § Graefe’s generalized spool iterator (Graefe [ BTW 03 ]) • Pre-fetches

Related Work § Graefe’s generalized spool iterator (Graefe [ BTW 03 ]) • Pre-fetches multiple outer tuples INL • Issue AIO for matching inner TIDS Spool operator • Can be replenished when empty or when one tuple is joined scan Index lookup COMAD 2008 , IIT Bombay

Related Work § AIO used in database products • Microsoft SQL Server, IBM DB

Related Work § AIO used in database products • Microsoft SQL Server, IBM DB 2, Oracle • No public documentation on how these systems use AIO § Asynchronous iteration for evaluating web queries (R. Goldman and J. Widom [ SIGMOD 2000 ] ) • They report results only on web queries COMAD 2008 , IIT Bombay

Conclusion § Proposed the Asynchronous Iterator Model (AIM) § Presented asynchronous versions of INL

Conclusion § Proposed the Asynchronous Iterator Model (AIM) § Presented asynchronous versions of INL and some operators § Showed gains of over 50 % in some cases § AIM can be useful in web-service access and in data integration systems like IBM Data. Joiner § Future work • Implementing async versions for index lookup, sub plan, sort and merge operator • Performing async IO in the presence of ordering constraints COMAD 2008 , IIT Bombay

Thank You Questions ? COMAD 2008 , IIT Bombay

Thank You Questions ? COMAD 2008 , IIT Bombay

Plans § Query 1 a : • Seq scan on lineitem, probe on orders

Plans § Query 1 a : • Seq scan on lineitem, probe on orders • Merge Join -> Index Scan on orders -> Sort lineitem -> Seq Scan on lineitem § Query 2 a: • Nested Loop -> Seq Scan on lineitem -> Index Scan on orders -> Index Scan on customer COMAD 2008 , IIT Bombay

Plans § Query 2 a Merge Join -> Sort orders -> Merge Join ->

Plans § Query 2 a Merge Join -> Sort orders -> Merge Join -> Index Scan on orders -> Sort on lineitem -> Seq Scan on lineitem -> Index Scan on customer § Query 2 b : Nested Loop -> Seq Scan on lineitem -> Index Scan on myorders -> Index Scan on customer COMAD 2008 , IIT Bombay