Lecture No 7 Database Structured and Unstructured Database
Lecture No. 7 Database - Structured and Unstructured Database. Lecture 7 / 1
Objectives of this Lecture • This lecture will be focussed mostly on database as a means of storing and retrieving data. • The term ‘database’ is always associated with the term ‘database management system’ (DBMS) which indicates that not only is data stored but it is controlled and managed by software. • You will most likely be using Microsoft Access for your assignment. This DBMS is aimed at the lower end of data volumes and processing. • You will also use Oracle in your laboratory work. This is a medium to large client-server DBMS Database. Lecture 7 / 2
Database. Lecture 7 / 3
Some Aspects of Data A major benefit of Computing is Storage and Retrieval of Data We need to have answers to these questions WHAT are we going to produce as ‘output’ ? WHAT data is to be stored ? WHAT is the level of detail (Name, Address, Height …? ) HOW long is to be retained - and is it ‘volatile’? WHAT form is it to be stored - on line, off line ? WHO is going to access it ? Database. Lecture 7 / 4
Some Aspects of Data HOW often will it be accessed ? WHAT security of access is to be applied ? WHAT are the starting volumes - what is the growth or decay factor ? WHAT response time is expected ? HOW accurate is the data content ? HOW current is it ? (e. g. on line sales retailing) Database. Lecture 7 / 5
Information / Data A General Definition: DATA - raw (unprocessed or partly processed) facts which represent the state of entities (things) which have occurred INFORMATION - data which has been processed into a form USEFUL TO THE USER What is Information to one user may be Data to another user. Database. Lecture 7 / 6
Audit Trail General Definition: ‘The presence of data processing media and procedures which allow any and / or all transaction(s) to be traced through ALL STAGES of processing’ This infers that the following devices / techniques are in place: 1. A logging device which ‘traps’ all transactions 2. Some way of tagging each transaction so that it can be identified 3. Some way of retrieving the required transaction(s) 4. Some way of archiving - what is the required period ? 5. Control procedures and processes to ensure integrity Database. Lecture 7 / 7
Data Base A Database is a shared collection of Inter-Related data designed to meet the needs of multiple types of users and applications. This implies that multiple user VIEWS can be defined Data stored is independent of the programs which use it Data is structured to provide a basis for future applications DATABASE = Stored Collection of Related Data May be physically distributed Database. Lecture 7 / 8
Database Management System A DBMS is SOFTWARE which provides access to the database in an integrated and controlled manner A DBMS must contain : 1. Data Definition and Structure capabilities 2. Data Manipulation capabilities Database. Lecture 7 / 9
Data Definition and Manipulation Data Definition Language (DDL) used to describe data at the database level Schema level - complete database description Sub-Schema level - user views (restricted) Data Manipulation Language (DML) Provides for Create Update Delete Modify Report Insert Retrieve (extract) Drop Calculation capabilities Database. Lecture 7 / 10
Three Level Architecture 1. External schema - User Views 2. Conceptual schema - Total database description 3. Internal schema - Physical database Database. Lecture 7 / 11
The Many Faces of Databases can be: 1. Transaction Intensive 2. Decision Support 3. Mixed-Load 4. Small databases 5. Very Large Database (VLDB) 6. Non Traditional - ATM’s Checkouts Browsing for trends Combination of both Few thousand records Many millions or trillions of records (Banks) - Weather bureau, flight plans Computer Aided Design data Database. Lecture 7 / 12
The Many Faces of Database • They can be: Data Warehouses Data Marts • How is a database size measured ? There a number of ‘measurements’ Raw data size Total database size Total usable disk space size (which includes media protection such as mirroring) Database. Lecture 7 / 13
The Many Faces of Database Hardware Database Raw Data Total Disk HP 9000 Oracle 100 GB 643 GB Digital 8400 Oracle 100 GB 361 GB IBM SP 2 DB 2/6000 100 GB 377 GB NCR 5100 Teradata 100 GB 880 GB NCR 5100 Teradata 1, 000 GB 3, 280 GB Database. Lecture 7 / 14
DBMS Requirements Querying Capabilities Data Displays (Presentation) Data entry Data Validation Data Deletion Committing Procedures (of changes) AND Data Integrity, Security, Consistency and Concurrency Capabilities Database. Lecture 7 / 15
Important Database Features • • • Data Integrity Data Independence Referential Integrity - Relational Database Model Concurrency Control - Multiple Users Consistency - multi users - distributed database - replicated database - partitioned database - mobile database • Recovery from failure (Transaction and Media) • Security Database. Lecture 7 / 16
File Processing (non database) Purchasing Program Billing Program Customers Accounts Receivable Accounts Payable Vendors Invoice Buyers Inventory Vendors Sales Order Processing Payroll Customers Inventory Employee Database. Lecture 7 / 17
File Processing With Database Technology Orders Department Program A Program B Program C Order Filling system Customers Inventory Master Accounting Department Program A Program B Invoicing system Back Orders Inventory Pricing Customer Master Database. Lecture 7 / 18
Inter. Related File Outline Salesperson Buyers Sales Stats Inventory Purchase Order Customer Accounts Receivable Vendor Accounts Payable General Ledger Database. Lecture 7 / 19
Conceptual Data Model customer order invoice product raw material work order vendor Database. Lecture 7 / 20
User Views customer order product invoice vendor raw material Database. Lecture 7 / 21
Database Models - Hierarchical owner/parent owner child /parent member child/parent Database. Lecture 7 / 22
Data Base Models - Network ‘set’ of data owner member ‘set’ of data Note: Only linked sets can be accessed Database. Lecture 7 / 23
Data Base Models - Relational table A table B table C table D table E Any table(s) can be joined to any other table(s), provided there is a means of effecting the join Primary key / Foreign key concept. Data redundancy No fixed linkages Database. Lecture 7 / 24
2 Relations EMPNUM 3 7 11 15 18 NAME JONES SMITH ADAMS NGUYEN PHAN Date of Birth 16 -05 -1956 23 -09 -1965 11 -08 -1972 23 -10 -1964 16 -11 -1976 DEPTNUM 605 432 201 314 201 Relation (Table) Name : EMP Relation Schema: EMP(empnum, name, date of birth, deptnum) DEPTNUM 201 314 432 605 DEPTNAME Production Finance Information Systems Administration Relation (Table) Name : DEPT Relation Schema: DEPT(deptnum, deptname) Database. Lecture 7 / 25
Definition of a Relational Database • A relational database is a collection of relations or two-dimensional tables. Database Table Name: EMPNO 7839 ENAME KING JOB EXECUTIVE 7698 7782 7566 BLAKE CLARK JONES MANAGER DEPTNO 10 30 10 20 Table Name: DEPTNO DNAME 10 ACCOUNTING 20 30 40 RESEARCH SALES OPERATIONS LOC CLAYTON CAULFIELD PENINSULA GIPPSLAND Database. Lecture 7 / 26
Data Models Model of system in client’s mind Entity model of client’s model Table model of entity model Server Tables on disk Database. Lecture 7 / 27
Communicating with a RDBMS Using SQL statement is entered SQL> SELECT loc FROM dept; Statement is sent to database Data is displayed LOC ------CLAYTON CAULFIELD PENINSULA GIPPSLAND Database. Lecture 7 / 28
Relational Database Management System Server User tables Data dictionary Database. Lecture 7 / 29
Relational Data. Base Data is represented in ROW and COLUMN form (matrix) (attribute) Collections of related data ---> TABLES (relations) 1 or more tables ----> DATA BASE ATTRIBUTES are generally static ROWS are DYNAMIC and Time-Varying The number of Attributes = DEGREE of a table The number of Rows = CARDINALITY of a table Database. Lecture 7 / 30
Some RDB Considerations • • Data is held in tables No order of data in tables - row or attribute Concept of Foreign Key - Primary Key relationship Data Typing - including nulls Query Access - insert, update, delete, retrieval Indexing on candidate (and Primary) keys Integrity Constraints Attribute value ranges Referential Integrity Entity Integrity User Defined Integrity • Set retention constraints Database. Lecture 7 / 31
Some RDB Considerations • Domain constraints • User defined ‘Rules’ e. g. no booking of rooms for lectures after 4. 00 pm Fridays (also known as Business Rules) • Recovery procedures. Used to restore a database after a failure • No explicit linkages between tables • Linking or embedding database operations in a procedural language (Cobol, C. . ) • Databases may be distributed across similar or different DBMS’s • Security features Database. Lecture 7 / 32
Data Description Language Used to describe data at the Database level Terms Used Structure and Attributes Schema : Complete description of the database using DDL Sub. Schema : Describes data in the database as it is ‘known’ to individual programs(processes) or users A segment or part of logical data record(s) required is commonly known as a VIEW Database. Lecture 7 / 34
Data Manipulation Language (commands and syntax) used to cause transfers of data from the Database and the Operating Environment and vice versa Variety of Models - Access, DB 2, d. BASEV, My. SQL, SQLServer Visual. Data. Base, Data. Boss, Ingres, Oracle, Informix. . Windows versions provide Icons and Menu options which are translated by the DBMS software to Database manipulation commands Typical commands: get, put, replace, seek, update, delete, insert, drop, find, modify Database. Lecture 7 / 35
Application Systems users(ATM’s) C programs users Cobol D B M S Access Database. Lecture 7 / 36
DBMS Components users Database Management System Retrieval Update Program Language Interface Application Programs Database Utility Programs Database. Lecture 7 / 37
A Practical Development Scenario In 198 n , a Company decided to develop and introduce a Payroll system using database technology. It looked this : Payroll System Payroll Database. Lecture 7 / 38
A Practical Development Scenario The Company grew in numbers and expertise and decided to introduce a Personnel System. The ‘new’ design was this Payroll System Payroll Data Personnel System Personnel Database. Lecture 7 / 39
A Practical Development Scenario In the next few years, these components were added Payroll System Payroll Data Job History Employee Tracing Data Personnel System Personnel Data Labor Distribution Labor Analysis Database. Lecture 7 / 40
Advantages of Data. Base • Reduced Data Redundancy • Data Integrity • Data Independence • Data Security • Data Consistency • Easier use of Data via DBMS Tools (Query languages, 4 GL's) Database. Lecture 7 / 41
Disadvantages of Database • Complexity • Expense • Vulnerability • Size of - disk storage, processor memory • Training Costs • Compatibility • Technology Lock In Database. Lecture 7 / 42
Database. Lecture 7 / 43
Data Types • Used to augment an attribute description and to provide a means of Integrity ’Normal’ data types are : Character (or text) Numeric - Integer, Decimal, Money, Float (in Access Long and Short Integer, Decimal with options of a number of ‘decimal’ places) Date - Standard date format - Access has various forms Logical - Yes/No True/False Exists/Does Not Exist Database. Lecture 7 / 44
Attribute Size In many cases this is set by the Data Definition facilities e. g. . Date, Short Integer, Long Integer, Logical, Number Others are set by the Designer: Number of Characters, Size of a ‘Decimal Number’ Access allows for a Default value Duplicates/ No Duplicates allowed Primary Key nomination Indexing ‘Required’ Status of an Attribute Database. Lecture 7 / 45
Integrity Examples Primary Key - Must have a value - not null - Must not be duplicated in the same table Attribute values must exist: (or not) student record: student identity number - must exist, not duplicated, must be a ‘valid’ number student name - must exist, may be duplicated student course code - must exist, must be a valid course subject code - if enrolled, must exist, code must be a valid code subject result - dependent on time. May be null. If it exists must be a valid grade and mark Database. Lecture 7 / 46
Integrity Examples Questions: 1. Do the same ‘constraints’ exist in Excel, Word ? 2. Should they ? 3. What are alternatives for embedding the same or similar controls 4. Are spreadsheets less ‘reliable’ than databases ? Database. Lecture 7 / 47
Queries A Query acts of the base table or tables of a database and returns a subset of this data. A Query normally returns a ‘restricted set’ of attributes (and their current values’ - this is the ‘User View’ of the database A Query normally has some criteria e. g. salaries > $50, 000 outstandings > 30 days date (of some event =, > or < some designated date) Criteria can be linked : event A OR event B, event A AND event B Database. Lecture 7 / 48
Reports A more formal output of data from base tables and in many cases produces high volume. The design is formatted ; Page numbering, Headings, Footers, Page breaks, Page or Item totals and sub-totals Reports can be ‘criteria based’ and include calculations (derived data) Report content can be imported from another source, or exported to another target. Reports can contain Exception Full Graphic Hypertext information and may be hard copy, electronic copy, Database. Lecture 7 / 49
A Puzzle How can these dots be joined by 4 straight lines without lifting a pencil (or pen) from the surface ? Database. Lecture 7 / 50
A Puzzle 1 2 3 No, that’s 5 lines 4 5 Database. Lecture 7 / 51
A Puzzle 1 5 4 2 6 That’s worse - it’s 6 lines 3 Database. Lecture 7 / 52
A Puzzle 1 How about this ? Database. Lecture 7 / 53
A Sunburnt Country This could be a commentary of the Summer weather I love a sunburnt country A land of sweeping plains, Of ragged mountain ranges, Of droughts and flooding rains I love her far horizons, I love her jewel-sea, Her beauty and her terror The wide brown land for me ! Database. Lecture 7 / 54
Database. Lecture 7 / 55
Non Structured Data • Up to date, we have studied some of the methods of file organisation associated with efficient data retrieval as embodied in the relational data base model. • However, although industry has invested large amounts of money in relational database applications, the greater percentage of data access requirements are made of nonstructured data sources (a classic one being the World Wide Web). • This lecture will uncover some of the techniques which locate and release data of this non-structured nature. Database. Lecture 7 / 56
Information Retrieval Non - Structured i. e. Occurrence of 'Attribute Values' neither regular nor regulated Processing Requirements - Mainly to reveal/release textual information - Information searches on 'key terms' - May be further processed by computer systems (e. g. Spreadsheets, Mathematical Models, Simulations) Database. Lecture 7 / 57
Information Retrieval User Expectations - All 'relevant' references will be found and released (Compare to structured DBMS closure theory) Note: Approximately 70 - 80% of 'Management Information' is from Non-structured Databases Database. Lecture 7 / 58
Information Retrieval Consider the following : When I do count the clock that tells the time, And see the brave day sunk in hideous night, When I behold the violet past prime, And sable curls all silvered o'er with white: When lofty trees I see barren of leaves, Which erst from heat did canopy the herd And summer's green all girded up in sheaves Database. Lecture 7 / 59
Information Retrieval Borne on the bier with white and bristly beard: Then of thy beauty do I question make That thou amongst the wastes of time must go, Since sweets and beauties do themselves forsake, And die as fast as they see others grow, And nothing against time's scythe can make defence Save breed to brave him when he takes thee hence Shakespeare, Sonnet No. 12 Database. Lecture 7 / 60
Information Retrieval Or , consider this: Now is the winter of our discontent Made glorious summer by this sun of York And all the clouds that loured upon our house In the deep bosom of the ocean buried Now are our brows bound with victorious wreaths; Our bruised arms hung up for monuments Our stern alarums changed to merry greetings Our dreadful marches to delightful measures Shakespeare: Soliloquy ? ? ? Database. Lecture 7 / 61
Information Retrieval Finally, another extract: The peasants who survived the plague found themselves in many cases afflicted by fresh burdens, for with fewer people to work the land, overlords demanded a standstill in wages and a return to feudal duties in full. But, with the shortage of labour, workers naturally expected to be valued more highly and to be given better pay and more freedom. Europe 14/15 th Century A history of the world - Rjunstead ( perhaps Enterprise Bargaining ? ) Database. Lecture 7 / 62
Information Retrieval Definition: An Information Retrieval System stores items of Information which need to be: Processed Searched Retrieved Analysed, Condensed, Explained and Sent to User Populations • Some Concerns: Data Base Management Decision Support (as with other Information Systems) Database. Lecture 7 / 63
Information Retrieval Requirements: • Must have efficient storage organisations • Rapid search procedures • Effective dissemination and user interaction Database. Lecture 7 / 64
Information Retrieval Systems are used to handle data in various forms such as • Bibliograpic • Textual • Audio • Video Database. Lecture 7 / 65
Information Retrieval • Allied Topics Information Theory Probability Theory Computational Semantics Programming Theory Algebra Database. Lecture 7 / 66
Information Retrieval Measurement of Usefulness: 1. Currency 2. Completeness Some queries: 1. Excess Information 2. Obsolete (? ) Information Database. Lecture 7 / 67
Information Retrieval Resource Management Growth rates Is it Important ? Up to 1800, the rate of publication doubling every 50 years 1800 to 1996 > 100, 000 scientific publications 1996 > 400, 000 scientific publications 2000 > 680, 000 scientific publications Impact of World Wide Web plus the translation factor Database. Lecture 7 / 68
Information Retrieval Some Information Organisation Problems Volume expansion is not evenly distributed for all topics Location of related items Relationship methods (trace, link, chain) Database. Lecture 7 / 69
Information Retrieval Key Functions of Information Retrieval Systems Input Requests Process Indexing Language Output Information Items (mapping) Organised for Search Formulation Request Representation Indexing Processes Information Representation Database. Lecture 7 / 70
Information Retrieval Item No Author Title Topic 1 Ash 2 Brown 3 Jones 4 Reynolds 5 Smith Aspects of Computerised Information Retrieval A Survey A History of users of of Computer Information Systems Retrieval The State of the Art of Retrieval Systems Users of New Retrieval Systems Computer Information Retrieval Systems Information Retrieval Users Information Retrieval Systems Retrieval System Users Computer Systems Database. Lecture 7 / 75
Information Retrieval Inverted File Organisation Sample Related Information Items Topic Computer Information 1 1 Retrieval Systems Users 1 1 3 2 4 4 3 2 5 5 5 Inverted Index Additions and Deletions require Index modifications Database. Lecture 7 / 76
Information Retrieval Inverted Files in a Commercially Available I. R. S. Boolean logic : associative terms queries Operators: AND OR NOT Functions used: Set Intersection Set Union Set Difference Database. Lecture 7 / 77
Information Retrieval Database. Lecture 7 / 78
Information Retrieval Item No Author Title Topic 1 Ash 2 Brown 3 Jones 4 Reynolds 5 Smith Aspects of Computerised Information Retrieval A Survey A History of users of of Computer Information Systems Retrieval The State of the Art of Retrieval Systems Users of New Retrieval Systems Computer Information Retrieval Systems Information Retrieval Users Information Retrieval Systems Retrieval System Users Computer Systems Database. Lecture 7 / 79
Information AND Retrieval Based on the data given: 1. Use Inverted Index to Locate the reference Numbers for the term “Information” Set 1 2. Ditto for the term “Retrieval” 3. Intersect of Set 1 and Set 2 Set 3 4. Use set 3 to retrieve documents identified in this set. Database. Lecture 7 / 80
Information OR Retrieval Create Set 1 as before on the occurrences of the term “Information” Create Set 2 as before on the occurrences of the term “Retrieval” Combine Set 1 and Set 2 Set 3 This is known as Set Union Database. Lecture 7 / 81
Information NOT Retrieval Perform the formation of Set 1 and Set 2 based on the occurrence of the term “Information” and also the term “Retrieval” Remove any reference from Set 1 which is included in Set 2 to produce a final set, Set 3 This is known as Set Difference And this is more Information. It’s informing me it’s time to stop. Database. Lecture 7 / 82
Additional Features Document Reference No. Paragraph No. Sentence No. Word No Document Hierarchy Retrieval (345 1 2 5) Could also be a Word Count from the beginning of text (distance indicator) Retrieval (345 13) interpreted as Document 345, 13 words from Beginning of Text (B. O. T) Database. Lecture 7 / 85
Some Commercial Inverted File Systems 1. Dialog 2. Stairs Uses Select and Combine Utilities to create and maintain database On. Line retrieval system AQUARIUS - includes a dictionary 3. Bibliographic Retrieval Services System (BRS) Based on Stairs 4. Medlars system (National Library of Medicine) Uses Index File Posting File Data File 5. Orbit system 6. Lexis system Database. Lecture 7 / 86
B+ Tree Example Document Clustering Network Catalog Hardware Morpheu Review Synonym Number Publication Apparatus Biomedicine Statistics Encyclopedia Stem Structure Grammar File Search for Query = ‘File’ Database. Lecture 7 / 88
String Matching - Boyer and Moore Based on 1. Analysis of query pattern 2. Auxiliary table 3. Character mapping 1 2 3 4 5 6 7 8 9 10 11 LECTUREROOM L occurs in position 1 E occurs in positions 2 and 7 C occurs in position 3 T occurs in position 4 U occurs in position 5 R occurs in positions 6 and 8 O occurs in positions 9 and 10 M occurs in position 11 Matching starts <Right> character and progressively shifts left on match Database. Lecture 7 / 89
Information Management Data retrieval - an application package TITAN: Museums, Births Deaths and Marriages Registries Art Galleries, Market Research, News Services, Hansard, Reference Libraries Platforms: Unix, AIX, Sun. O/S, PCs to Mainframes Method - Signature Files - A bit string for each record in the data file. Uses Indexed terms Database. Lecture 7 / 90
Information Management 2 stages necessary: 1. Create descriptor for each Indexed term 2. Superimpose each term descriptor (exclusive OR-ing) Professional ---> 100100. . (32 bits) term Computing ---> 010100. . descriptors Magazine ---> 000101. . Record descriptor 110101 (inclusive OR) Record descriptors ---> record descriptor file + Pointers Database. Lecture 7 / 91
- Slides: 83