Umbrella 0 5 Project Report 1 Heterogeneous Data
Umbrella 0. 5 Project Report #1 Heterogeneous Data Source Access Using Standard SQL Queries Eric Matson – MSE Kansas State University 10/3/2020 1
Agenda • • • Overview Methods & Tools Requirements Project Management and Control GUI Walkthrough/Prototyping Summary 10/3/2020 2
Overview • Purpose • Goals • Description – – Architecture Key Elements Process Test Lab • Scope • Future Directions • Literature Search Results 10/3/2020 3
Purpose • Query heterogeneous data sources using standard SQL statements • Achieve database integration and access at the query level • Allow the data to be “black box” whereas users don’t need to have understanding to source, architecture, or technical details 10/3/2020 4
Goals • • Complete the Requirements of CIS 895 Successfully Navigate Software Engineering Process Prove Umbrella Concept Prove Ability to Process Umbrella Queries – Split the query into to process heterogeneous sources – Manage the connection processing with the data sources – Return and create new relation(s) • Create easy to use Graphical User Interface (GUI) 10/3/2020 5
Description • Enter queries using standard SQL statements • Query heterogeneous data sources – Text Files • Formatted • Delimited – Relational Databases • Postgre. SQL • My. SQL • Informix – Knowledge Base Formats • Prolog Rules • CLIPS Deftemplate facts – Non-relational Databases – ODBC Data Sources • Unify Data together as single relation 10/3/2020 6
Description – Architecture Parser API Umbrella GUI Splitter Router Unifier Profile Engine 10/3/2020 Relational Database D a t a C o n n e c t o r s Object Database TCP/IP Text File KB File Data Server Data Sources 7
Description – Key Elements • GUI: User Interface • Umbrella Engine Agent – – – Parser: Splitter: Router: Unifier: Profile: Checks the query for correctness and validity Splits query to appropriate data sources Manages routing and connection to data source Unites data upon return Database to manage data source information • Data Connector: Interface to data sources • Data Server: – Text: – KB: 10/3/2020 “Wraps” data and serves the Umbrella Engine “Wraps” fact and serves the Umbrella Engine 8
Description - Process SELECT * FROM Job, Name, Addr WHERE Job. SSN = Name. SSN AND Job. SSN = Addr. SSN Job Parse Split Route SELECT * FROM Job DC SELECT * FROM Name SSN First Middle Last Addr SSN Street State ZIP DC DC SELECT * FROM Addr 10/3/2020 SSN Title Salary Unify SSN Title Salary First Middle Last Street State ZIP 9
Description - Test Lab Client Network Server Postgre. SQL Linksys 10 Mb Hub Windows 98 192. 168. 1. 2 Red Hat 7. 1 My. SQL 192. 168. 1. 1 Text 10/3/2020 10
Scope • Functionality – – – Umbrella 0. 5 Query Engine Graphical User Interface (GUI) Data Servers for Text Data Sources Data Connectors SQL Select Statement in Limited Format • Data Sources – – 10/3/2020 Postgre. SQL: ANSI Compliant ORDBMS My. SQL: non-ANSI Compliant RDBMS Formatted Text Delimited Text 11
Future Directions • Version 0. 6 – SQL release – – • Implement full grammar support for SQL Enhance dataset join algorithm Enhance GUI Add ODBC, Berkeley DB, Interbase and ADABAS data connectors Version 0. 7 – KD/AI release – Add Lisp, Prolog, CLIPS data connectors – Enhance Query Efficiency Algorithms • Version 0. 8 – Integration release – Add API layer for access by C/C++, Java, VB – Add tools for advanced searches and Data Mining – Add Security Module • Version 0. 9 – Data Connector release – Add Oracle, Informix, UDB, Sybase data connectors – Add Object Database data connectors – Add XML data connector • Version 1. 0 – Mainframe release – Add IMS, DB 2, VSAM data connectors 10/3/2020 12
Literature Search • Looked at US/Canadian Research Units – Companies – Universities • Similar Research/Product Development – WWW. Jibe. com - Only JDBC/ODBC sources – Queens University(Canada)/IBM – Older pre-Internet “Explosion” – Stanford Mariposa Project • Summary – Some Similar Projects – Most are Tangential in Scope and Function 10/3/2020 13
Methods & Tools (M&T) • • • Development Process Object Modeling Formalism/Constraints Development Tools Infrastructure 10/3/2020 14
M&T – Development Process COMET (Concurrent Object Modeling and architectural design m. EThod – Hassan Gomaa) Planning COMET Process Requirements Modeling Analysis Modeling Design Modeling Throwaway Prototyping Phase 1 10/3/2020 Delivery Non-COMET Process Phase 2 Incremental Software Construction Incremental Software Integration Incremental Prototyping Systems Testing Phase 3 15
M&T – Object Modeling • Unified Modeling Language (UML) used – UML is standard OO modeling language – Well understood • UML Tool – Poseidon – Argo. UML – Both are very similar, Poseidon is an extension of Argo. UML 10/3/2020 16
M&T – Formalism/Constraints • Object Constraint Language (OCL) – Easier to understand – Less mathematical – Less developed • USE – Graphical tool to support OCL 10/3/2020 17
M&T – Development Tools • Sun Java 1. 3 JDK/JRE – Newest release of development kit – Write Once, Run Anywhere (Portable) • Forte (Netbeans) Integrated Development Environment – Community Edition (CE) – Easy to use – Free! 10/3/2020 18
M&T - Infrastructure Server OS: Red Hat Linux 7. 1 Object/Relational Database Server: Postgre. SQL Relational Database Server: My. SQL 10/3/2020 19
Requirements • • Object Diagram Class Descriptions SQL Grammar SQL Example 10/3/2020 20
Requirements – Object Diagram Data. Conn Parser Engine Rel. Data. Conn Text. Data. Conn Query Sub. Query Interagator Splitter JDBCConn Form. Data. Conn Del. Data. Conn My. SQLConn Postgres. Conn Profile Router JDBCProfile Text. Profile Form. Text. Prof Text. Desc Unifier Del. Text. Prof Result. Set Data. Server 10/3/2020 21
Requirements – Class Descriptions • • • Engine: Parser: Splitter: Router: Unifier: Query: Sub. Query: Result. Set: Text. Desc: Interrogator: Data. Server: 10/3/2020 Control agent of Umbrella architecture Checks query for grammar/valid data objects Splits queries into valid sub-queries Routes queries to data source/manages connection Unifies sub-query result sets into single relation SQL statement Created by Splitter for specific data source Data returned from sub-query of data source Defines characteristics of text data source Queries a new data source for information on field definitions Server that supports access to text data sources 22
Requirements – Class Descriptions • • Profile: Defines characteristics of a data source – JDBCProfile – Formatted. Text. Profile – Delimited. Text. Profile Data. Connector: Defines specifics to query a data source – Relational. Data. Connector • JDBCData. Connector – My. SQLData. Connector – Postgre. SQLData. Connector – Text. Data. Connector • Formatted. Text. Data. Connector • Delimited. Text. Data. Connector 10/3/2020 23
Requirements – SQL Grammar • <statement> • • • • <select> <fieldname> <tablename> <conditionals> <relation> <operand> <rightside> <vartype> <string> <integer> <real> <alpha> <digit> 10/3/2020 : : = SELECT <select> FROM <table>. | SELECT <select> FROM <table> WHERE <condition>. : : = <field> | <select>, <field> | * : : = <fieldname> | <tablename>. <fieldname> : : = <alpha> {<alpha>}{<digit>} : : = <tablename> | <tablename> {, <tablename>} : : = <alpha> {<alpha>}{<digit>} : : = <relation> {AND <relation>}{OR <relation>} : : = <field> <operand> <rightside> : : = < | > | = : : = <field> | <vartype> : : = <string> | <integer> | <real> : : = “” | “<alpha> {<alpha>}{<digit>} “ : : = <digit> {<digit>} : : = <integer> : : = a|b|c…x|y|z|A|B|C…X|Y|Z : : = 0|1|2|3|4|5|6|7|8|9 24
Requirements – SQL Example • SELECT * FROM address; • SELECT ssn, age FROM employee; • SELECT lastname, firstname FROM employee WHERE age > 40; • SELECT lastname, firstname, salary FROM employee, hr WHERE employee. ssn = hr. ssn AND employee. age > 25; 10/3/2020 25
Project Planning and Control • Project Planning – Work Breakdown Structure (WBS) – Gantt Chart – Description of Estimation Techniques • Project Control – Effort/Budget – Project Log and Time Tracking – Change Management (Feedback Loops) 10/3/2020 26
Project Planning - WBS 1 Umbrella Goals: 1. 1 Planning 1. 1. 1 Define Purpose 1. 2 Requirements Modeling 1. 1. 2 Define Scope 1. 3 Throwaway Prototyping • Tasks: . 25 < hours < 8 Outcome: 1. 1. 3 Define Purpose • Stay on schedule 1. 1. 4 Define Goals • Prevent “creep” 1. 4 Analysis Modeling 1. 5 Design Modeling 1. 6 Incr. SW Construction … 1. 7 Incr. SW Integration 1. 8 Incr. Prototyping 1. 9 Systems Testing 1. 10 Delivery 10/3/2020 27
Project Planning – Gantt Chart September October November December January Planning Req’s Modeling Throwaway Prototyping Analysis Mod. Design Mod. Incr. SW Const Incr. SW Integ. Incr. Prototyping Systems Testing Delivery Review #1 10/3/2020 Review #2 Review #3 28
Project Planning – Estimation Techniques • Estimate Tasks on Defined WBS – All tasks broken into “chunks” of work between. 25 and 8 hours • Keep project on schedule • WBS Estimation Used – Small Project Team – Well Understood Capabilities of Project Member(s) • Change Budget Allocated at 20% of Initial Estimate – Works with Iterative Schemes – Change is Planned and Accepted 10/3/2020 29
Project Control – Effort/Budget • Effort – Effort Data – Predicted Hours 565 Man/Mo (168 hr/mo) 3. 36 – Current 183. 25 1. 09 • Budget – Time ($50/hr) $28, 250 – Current $9, 162. 50 10/3/2020 30
Project Control – Time Log/Issues • Time Log – Track time spent each day on project tasks • WBS Number • Hours – Viewed on Engineering Notebook Site • Issues – As design issues are realized • Issues Documented • Solutions Arrived at and Documented • Folded in to Change Management Process – If Impact and In Scope 10/3/2020 31
Project Control – Change Mgmt. • Development Process (COMET) Oriented for Change – Feedback changes will be tracked • • • 10/3/2020 Change Control Number Task Name Added Hours to Project Schedule Description of Change WBS Number of Artifacts Altered Dependencies 32
Graphical User Interface File Connection Query Help Connections --- 192. 168. 1. 2 ---- /opt/data ----formatted. txt ---- Postgre. SQL Server ----Employee ----Salary --- 192. 168. 1. 1 ---- My. SQL Server ----Items ----Invoice ----Customers ---- /var/lib/data ----delimited. txt Queries --- Employees and Customers --- Employee ---- ssn, lastname --- Customers ---- ssn Define Query Name: Employees and Customers Query: SELECT ssn, lastname FRO Results 111 -22 -3333 Elvis Presley 222 -33 -4444 Richard Nixon 333 -44 -5555 Mikhail Gorbechev Status: Subquery 1: OK Subquery 2: Failed 10/3/2020 33
Summary • • • Umbrella accesses heterogeneous data sources Defined set of tools and methodologies High Level Requirements Defined Project Management Well Defined Easy to Use GUI Next Steps – – 10/3/2020 Questions Feedback Signoff Proceed to Phase #2 34
- Slides: 34