A Comprehensive Model for Arbitrary Result Extraction Neal
A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard
Shift in Programming Tasks Integration/Composition Coding 1970 2 1990 SAC 2002 2010
Sample Composition Tasks n Logistics n n Genomics n n 3 Framework for composing various processing tools and repositories Modeling n n Reservation and distribution systems, “find the best transportation route from A to B” Weather prediction, complex chemical systems, basin modeling Composition of processes (vs. components, data) SAC 2002
CLAM Composition Language n Purely compositional n n n Splitting up CALL-statement n n n 4 parallelism by asynchrony in sequential program novel possibilities for optimizations reduction in complexity of invocation statements Higher-level language n n no primitives for arithmetic no primitives for I/O, etc. assembly HLLs compositional paradigm Intent: Enable domain experts SAC 2002
CLAM Primitives Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service 5 SAC 2002
Data Dependencies & Scheduling START // begin program A = service 1(); B = service 2(); C = service 3(A, B); D = service 4(C); E = service 5(C); // end of program 6 service 1 service 2 service 3 service 4 service 5 END SAC 2002
Runtime: data extraction is hard n n Data extraction with native modules worked No language-level specifications in CLAM n n Multiple middleware for transport difficult mapping n n 7 CORBA-RMI, RMI-COM, COM-CPAM, etc. Crisis of legacy services n n E. g. , Polling, threading, exception handling… To generalize or restrict? Refine the strategy… SAC 2002
Strategy: hide it & depend on it n Have to respect service capabilities n n Simple and flexible programming n n Legacy ambivalence Simple bridging for middleware Increase audience for services Better scheduling n 8 Data extractions is a runtime issue, it is not central to composition task Simplified Integration n n Or suffer the LCD… (more in a bit) Declarative language, data dependencies SAC 2002
Where are we? n Declarative language for composition n Apparent “mismatch” in data extraction methods & capabilities among various actors n n 9 Data is used synchronization No primitives to support synchronization What does the data look like? How can data be extracted? SAC 2002
Data View: Services RESULTS Result A 10 Result B SAC 2002 Result C
Extraction Techniques n Asynchrony n n n Partial extraction n web browsing - HTML text as a schema SQL cursors (thanks to the reviewer) Progressive extraction (exceptional) n 11 Explicitly controlled: spin-locks, polling, interrupt handling, etc. Can use with any DAG schedule Adaptive mesh refinements, JPEG interleaving SAC 2002
Current Focus Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service 12 SAC 2002
Current Focus Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: for optimization Invocation and result gathering: INVOKE: begin execution EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service 13 SAC 2002
EXAMINE Primitive in CLAM n Returns “status” and “progress” n Status – 2 bits of state n n Progress – open descriptor n n n 14 Indicates progress in application specific-way Could be variance, mean, amplitude, etc. Default assumption: integer 0 -100 = % done Resolution of EXAMINE n n status = {DONE, NOT_DONE, PARTIAL, ERROR} Can apply per service (black box) Can apply per result (white box) Not complete for many legacy systems: only “status”, no “progress” SAC 2002
EXAMINE Service A B C Service A 15 B C Service. EXAMINE() Service. EXAMINE(A) Service. EXAMINE(B) Service. EXAMINE(C) {NOT_DONE, Service. EXAMINE() Service. EXAMINE(A) Service. EXAMINE(B) Service. EXAMINE(C) {PARTIAL, 40} {DONE, 100} {NOT_DONE, 0} {PARTIAL, 20} Service. EXAMINE() Service. EXAMINE(A) Service. EXAMINE(B) Service. EXAMINE(C) {DONE, SAC 2002 0} 0} 100}
EXTRACT Primitive n Extracts data from a service n Per service (black box) n n Per result (white box) n n n saves volume: abandon uninteresting elements saves time: termination of useless invocation Allows progressive data extraction with 2 -value EXAMINE (status+progress) n 16 (var. A = A, var. C = C) = Service. EXTRACT(); Allows partial data extraction n n (var) = Service. EXTRACT(); Steering, time saving SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) CLAM 17 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) CLAM 18 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 19 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 20 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 21 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 22 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 23 SAC 2002
Examine-Extract Relationship EXTRACT per service status only per service per result asynchronous procedure call, limited Partial Extraction, (binary) thumbnails Java RMI partitioned ? per service status+ progressive extract progress (full result set) semantic partial extraction per result extraction browsing, SQL cursor status only (full result set) (no progressive) per result progressive and status+ extraction partial extraction progress (full result set) *CLAM 24 SAC 2002
Conclusions n Data extraction hiding is bueno! n n n 25 User is not responsible for data management Synchronizing extractions not in the language simplicity Enables effective service scheduling Simplified integration Blueprint for proactive design pattern for future services SAC 2002
- Slides: 25