UNIT III DATABASE MANAGEMENT SYSTEMS DBMS HDBMS NDBMS
UNIT III DATABASE MANAGEMENT SYSTEMS ØDBMS ØHDBMS, NDBMS, RDBMS, OODBMS ØQUERY PROCESSING ØSQL ØCONCURRENCY MANAGEMENT ØDATA WAREHOUSING ØDATA MART 1
Database: What • Database – is collection of related data and its metadata organized in a structured format – for optimized information management • Database Management System (DBMS) – is a software that enables easy creation, access, and modification of databases – for efficient and effective database management • Database System – is an integrated system of hardware, software, people, procedures, and data – that define and regulate the collection, storage, management, and use of data within a database environment 2
Database Management System - manages interaction between end users and database Database Systems: Design, Implementation, & Management: Rob & Coronel 3
Meta data- data about data 4
Database System Environment § Hardware § Software - OS - DBMS - Applications § People § Procedures § Database Systems: Design, Implementation, & Management: Rob & Coronel 5
Uses of DBMS – examples • Retail: The retail industry and companies like Best Buy and even online businesses like ebay, rely heavily on database management systems to store data related to their sales, to track purchases and client information updates as well as to analyze automatically the growth charts. 6
Financial institutions • All financial institutions rely heavily on Database management systems to be able to record customer transactions, customer information, credit/debit information. The ability to manage all that data in such an accurate way to handle millions of customers and to be able to access your balance in a branch in Montreal when you actually opened your Bank account in Vancouver, for example, is thanks to the existence of an elaborate DBMS system in place. 7
Uses of DBMS ( contd) • Railway Reservation System • Database is required to keep record of ticket booking, train’s departure and arrival status. Also if trains get late then people get to know it through database update. 8
• Library Management System • There are thousands of books in the library so it is very difficult to keep record of all the books in a copy or register. So DBMS used to maintain all the information relate to book issue dates, name of the book, author and availability of the book. 9
• Social Media Sites • We all are on social media websites to share our views and connect with our friends. Daily millions of users signed up for these social media accounts like facebook, twitter, pinterest and Google plus. But how all the information of users are stored and how we become able to connect to other people, yes this all because DBMS. 10
• Human Resource Management • Big firms have many workers working under them. Human resource management department keeps records of each employee’s salary, tax and work through DBMS. • Manufacturing companies make products and sales them on the daily basis. To keep records of all the details about the products like quantity, bills, purchase, supply chain management, DBMS is used. 11
Database: Why • Purpose of Database – Optimizes data management – Transforms data into information • Importance of Database Design – Defines the database’s expected use • different approach needed for different types of databases – Avoid data redundancy & ensure data integrity • data is accurate and verifiable – Poorly designed database generates errors • leads to bad decisions • can lead to failure of organization • Functions of DBMS/Database System – Stores data and related data entry forms, report definitions, etc. – Hides the complexities of relational database model from the user • facilitates the construction/definition of data elements and their relationships • enables data transformation and presentation – Enforces data integrity – Implements data security management • access, privacy, backup & restoration 12
Database: How • Planning & Analysis – Assess • Goal of the organization • Database environment – existing hardware, software, raw data, data processing procedures – Identify • Database needs – what database can do to further the goal of the organization • User needs and characteristics – who the users are, what they want to do, how they envision doing it • Database system requirements • Design – what the database system should do to satisfy the database and user needs – From conceptual design to a detailed system specification • Implementation – Create the database • Maintenance – Troubleshoot, update, streamline the database 13
Business Rules • What – Brief, precise, and unambiguous descriptions of operations in an organization • based on policies, procedures, or principles within a specific organization • help to create and enforce actions within that organization’s environment • apply to any organization that stores and uses data to generate information • Why – Enhance understanding & facilitate communication • Standardize company’s view of data • Constitute a communications tool between users and designers • Allow designer to understand business process as well as the nature, role, and scope of data – Promote creation of an accurate data model • How (sources) – Interviews • • Company managers Policy makers Department managers End users – Written documentation • Procedures, Standards, Operations manuals – Observation • Business operations 14
Database: User-centered • Perspective – The user is always right. If there is a problem with the use of the system, the system is the problem, not the user. • Compliance – The user has the right to a system that performs exactly as promised. • Instruction – The user has the right to easy-to-use instructions (user guides, online or contextual help, error messages) for understanding and utilizing a system to achieve desired goals and recover efficiently and gracefully from problem situations. • Usability – The user should be the master of software and hardware technology, not vice-versa. Products should be natural and intuitive to use. 15
Database: Data Models • Importance – Abstraction of complex real-world data structures in relative simple (graphical) representations – Facilitate interaction among the designer, the applications programmer, and the end user • Basic Building Blocks – Entity • thing about which data are to be collected and stored – Attribute • a characteristic of an entity – Relationship • describes an association among entities – Constraint • restrictions placed on the data 16
Evolution of Data Models • Timeline 1960 s 1970 s 1980 s 1990 s 2000+ File-based Hierarchical Object-oriented Network Relational Web-based Entity-Relationship 17
Database: Historical Roots • Manual File System – to keep track of data – used tagged file folders in a filing cabinet – organized according to expected use • e. g. file per customer – easy to create, but hard to • locate data • aggregate/summarize data • Computerized File System – to accommodate the data growth and information need – manual file system structures were duplicated in the computer – Data Processing (DP) specialists wrote customized programs to • write, delete, update data (i. e. management) • extract and present data in various formats (i. e. report) 18
File System: Example Database Systems: Design, Implementation, & Management: Rob & Coronel 19
File System: Weakness • Weakness – “Islands of data” in scattered file systems. • Problems – Duplication • same data may be stored in multiple files – Inconsistency • same data may be stored by different names in different format – Rigidity • requires customized programming to implement any changes • cannot do ad-hoc queries • Implications – Waste of space – Data inaccuracies – High overhead of data manipulation and maintenance 20
File System: Problem Case CUSTOMER file AGENT file A_Name (15 char) A_Name (20 char) Carol Johnson Carol T. Johnson SALES file AGENT (20 char) Carol J. Smith - inconsistent field name, field size - inconsistent data values - data duplication 21
Database System vs. File System 22 Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database • Background – Developed to manage large amount of data for complex manufacturing projects – e. g. , Information Management System (IMS) • IBM-Rockwell joint venture • clustered related data together • hierarchically associated data clusters using pointers • Hierarchical Database Model – Assumes data relationships are hierarchical • One-to-Many (1: M) relationships – Each parent can have many children – Each child has only one parent – Logically represented by an upside down tree 23
Hierarchical Database: Example 24 Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database: Pros & Cons • Advantages – Conceptual simplicity • groups of data could be related to each other • related data could be viewed together – Centralization of data • reduced redundancy and promoted consistency • Disadvantages – Limited representation of data relationships • did not allow Many-to-Many (M: N) relations – Complex implementation • required in-depth knowledge of physical data storage – Structural Dependence • data access requires physical storage path – Lack of Standards • limited portability 25
Network Database • Objectives – Represent more complex data relationships – Improve database performance – Impose a database standard • Network Database Model – Similar to Hierarchical Model • Records linked by pointers – Composed of sets • Each set consists of owner (parent) and member (child) – Many-to-Many relationships representation • Each owner can have multiple members (1: M) • A member may have several owners 26
Network Database: Example 27 Database Systems: Design, Implementation, & Management: Rob & Coronel
Network Database: Pros & Cons • Advantages – More data relationship types – More efficient and flexible data access • “network” vs. “tree” path traversal – Conformance to standards • enhanced database administration and portability • Disadvantages – System complexity • require familiarity with the internal structure for data access – Lack of structural independence • small structural changes require significant program changes 28
Relational Database • Problems with legacy database systems – Required excessive effort to maintain • Data manipulation (programs) too dependent on physical file structure ( machine view) – Hard to manipulate by end-users • No capacity for ad-hoc query (must rely on DB programmers). Evolution in Data Organization – E. F. Codd’s Relational Model proposal • Separated the notion of physical representation (machine-view) from logical representation (human-view) • Considered ingenious but computationally impractical in 1970 – Relational Database Model • Dominant database model of today • Eliminated pointers and used tables to represent data • Tables – flexible logical structure for data representation – a series of row/column intersections – related by sharing common entity characteristic(s) 29
Relational Database: Example n Provides a logical “human-level” view of the data and associations among groups of data (i. e. , tables) 30
Relational Database: Pros & Cons • Advantages – Structural independence • Separation of database design and physical data storage/access • Easier database design, implementation, management, and use – Ad hoc query capability with Structured Query Language (SQL) • SQL translates user queries to codes • Disadvantages – Substantial hardware and system software overhead • more complex system – Poor design and implementation is made easy • ease-of-use allows careless use of RDBMS 31
Entity Relationship Model • Peter Chen’s Landmark Paper in 1976 – “The Relationship Model: Toward a Unified View of Data” – Graphical representation of entities and their relationships • Entity Relationship (ER) Model – Based on Entity, Attributes & Relationships • Entity is a thing about which data are to be collected and stored – e. g. EMPLOYEE • Attributes are characteristics of the entity – e. g. SSN, last name, first name • Relationships describe an associations between entities – i. e. 1: M, M: N, 1: 1 – Complements the relational data model concepts • Helps to visualize structure and content of data groups – entity is mapped to a relational table • Tool for conceptual data modeling (higher level representation) – Represented in an Entity Relationship Diagram (ERD) • Formalizes a way to describe relationships between groups of data 32
E-R Diagram: Chen Model • Entity – represented by a rectangle with its name in capital letters. • Relationships – represented by an active or passive verb inside the diamond that connects the related entities. • Connectivities – i. e. , types of relationship – written next to each entity box. Database Systems: Design, Implementation, & Management: Rob & Coronel 33
E-R Diagram: Crow’s Foot Model • Entity – represented by a rectangle with its name in capital letters. • Relationships – represented by an active or passive verb that connects the related entities. • Connectivities – indicated by symbols next to entities. • 2 vertical lines for 1 • “crow’s foot” for M Database Systems: Design, Implementation, & Management: Rob & Coronel 34
E-R Model: Pros & Cons • Advantages – Exceptional conceptual simplicity • easily viewed and understood representation of database • facilitates database design and management – Integration with the relational database model • enables better database design via conceptual modeling • Disadvantages – Incomplete model on its own • Limited representational power – cannot model data constraints not tied to entity relationships » e. g. attribute constraints – cannot represent relationships between attributes within entities • No data manipulation language (e. g. SQL) – Loss of information content • Hard to include attributes in ERD 35
Object-Oriented Database • Semantic Data Model (SDM) – Modeled both data and their relationships in a single structure (object) • • Developed by Hammer & Mc. Leod in 1981 Object-oriented concepts became popular in 1990 s – Modularity facilitated program reuse and construction of complex structures – Ability to handle complex data types (e. g. multimedia data) • Object-Oriented Database Model (OODBM) – Maintains the advantages of the ER model but adds more features – Object = entity + relationships (between & within entity) • • consists of attributes & methods – attributes describe properties of an object – methods are all relevant operations that can be performed on an object self-contained abstraction of real-world entity – Class = collection of similar objects with shared attributes and methods • • e. g. EMPLOYEE class = (employ 1 object, employ 2 object, …) organized in a class hierarchy – e. g. PERSON > EMPLOYEE, CUSTOMER – Incorporates the notion of inheritance • attributes and methods of a class are inherited by its descendent classes 36
OO Database Model vs. E-R Model OODBM: - can accommodate relationships within a object - objects to be used as building blocks for autonomous structures Database Systems: Design, Implementation, & Management: Rob & Coronel 37
Object-Oriented Database: Pros & Cons • Advantages – Semantic representation of data • fuller and more meaningful description of data via object – Modularity, reusability, inheritance – Ability to handle • complex data • sophisticated information requirements • Disadvantages – Lack of standards • no standard data access method – Complex navigational data access • class hierarchy traversal – Steep learning curve • difficult to design and implement properly – More system-oriented than user-centered – High system overhead • slow transactions 38
Web Database • Internet is emerging as a prime business tool – Shift away from models (e. g. relational vs. O-O) – Emphasis on interfacing with the Internet • Characteristics of “Internet age” databases – – Flexible, efficient, and secure Internet access Support for complex data types & relationships Seamless interfaces with multiple data sources and structures Ease of use for end-user, database architect, and database administrator • Simplicity of conceptual database model • Many database design, implementation, and application development tools • Powerful DBMS GUI 39
Lab: Access Automations • MS Access Automations – can save effort & time – may not suit your needs – Templates & Wizards • Group Project – Project Team formation – Project Description 40
: Query Processing Query processing includes translation of high-level queries into low-level expressions that can be used at the physical level of the file system, query optimization and actual execution of the query to get the result. Query processing: A 3 -step process that transforms a high -level query into an equivalent and more efficient lowerlevel query • • Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions 41
42
Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation 43
Basic Steps in Query Processing (Cont. ) • Parsing and translation – translate the query into its internal form. This is then translated into relational algebra. – Parser checks syntax, schema elements verifies relations 44
Syntax 45
Schema Elements 46
Optimizer: • Best plan to evaluate the relational algebra • Which algorithm is less cost and more efficient 47
Statistics About Data: 48
• Evaluation – Query evaluation uses data and evaluates the best way to meet the requirements 49
Measures of Query Cost • Cost is generally measured as total elapsed time for answering query – Many factors contribute to time cost • disk accesses, CPU, or even network communication • Typically disk access is the predominant cost, and is also relatively easy to estimate. 50
SQL 51
What is SQL? • • Structured Query Language Communicate with databases Used to created and edit databases. Also used to create queries, forms, and reports
Introduction to SQL • SQL functions fit into two broad categories: – Data definition language • SQL includes commands to: – Create database objects, such as tables, indexes, and views – Define access rights to those database objects – Data manipulation language • Includes commands to insert, update, delete, and retrieve data within database tables 53
Introduction to SQL (continued) • SQL is relatively easy to learn • Basic command set has vocabulary of less than 100 words • Nonprocedural language • American National Standards Institute (ANSI) prescribes a standard SQL • Several SQL dialects exist 54
Table name Tables in SQL Attribute names Product PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works Single. Touch $149. 99 Photography Canon Multi. Touch $203. 99 Household Hitachi Tuples or rows
Tables Explained • The schema of a table is the table name and its attributes: Product(PName, Price, Category, Manfacturer) • A key is an attribute whose values are unique; we underline a key Product(PName, Price, Category, Manfacturer)
Data Types in SQL • Atomic types: – Characters: CHAR(20), VARCHAR(50) – Numbers: INT, BIGINT, SMALLINT, FLOAT – Others: MONEY, DATETIME, … • Every attribute must have an atomic type – Hence tables are flat
Tables Explained • A tuple = a record – Restriction: all attributes are of atomic type • A table = a set of tuples – Like a list… – …but it is unorderd: no first(), no next(), no last().
SQL Query Basic form: SELECT <attributes> FROM <one or more relations> WHERE <conditions>
Simple SQL Query Product PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works Single. Touch $149. 99 Photography Canon Multi. Touch $203. 99 Household Hitachi PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works SELECT * FROM Product WHERE category=‘Gadgets’ “selection”
Simple SQL Query Product PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works Single. Touch $149. 99 Photography Canon Multi. Touch $203. 99 Household Hitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 “selection” and “projection” PName Price Manufacturer Single. Touch $149. 99 Canon Multi. Touch $203. 99 Hitachi
Notation Input Schema Product(PName, Price, Category, Manfacturer) SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 Answer(PName, Price, Manfacturer) Output Schema
Details • Case insensitive: – Same: SELECT Select select – Same: Product product – Different: ‘Seattle’ ‘seattle’ • Constants: – ‘abc’ - yes – “abc” - no
Like Used to make complex searching easy. If you are trying to find all people’s names which begin with E for example: ex) select firstname from employee where firstname LIKE 'E%';
Eliminating Duplicates SELECT DISTINCT category FROM Product Category Gadgets Photography Household Compare to: Category SELECT category FROM Product Gadgets Photography Household
Ordering the Results SELECT pname, price, manufacturer FROM Product WHERE category=‘gizmo’ AND price > 50 ORDER BY price, pname Ties are broken by the second attribute on the ORDER BY list, etc. Ordering is ascending, unless you specify the DESC keyword.
SELECT DISTINCT category FROM Product ORDER BY category SELECT Category FROM Product ORDER BY PName SELECT DISTINCT category FROM Product ORDER BY PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works Single. Touch $149. 99 Photography Canon Multi. Touch $203. 99 Household Hitachi ? ? ?
Keys and Foreign Keys Company Key CName Stock. Price Country Gizmo. Works 25 USA Canon 65 Japan Hitachi 15 Japan Product PName Price Category Manufacturer Gizmo $19. 99 Gadgets Gizmo. Works Powergizmo $29. 99 Gadgets Gizmo. Works Single. Touch $149. 99 Photography Canon Multi. Touch $203. 99 Household Hitachi Foreign key
Joins Product (pname, price, category, manufacturer) Company (cname, stock. Price, country) Find all products under $200 manufactured in Japan; return their names and prices. SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200 Join between Product and Company
Joins Product Company PName Price Category Manufacturer Cname Stock. Price Country Gizmo $19. 99 Gadgets Gizmo. Works 25 USA Powergizmo $29. 99 Gadgets Gizmo. Works Canon 65 Japan Single. Touch $149. 99 Photography Canon Hitachi 15 Japan Multi. Touch $203. 99 Household Hitachi SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= 200 PName Price Single. Touch $149. 99
Correlated Queries Movie (title, year, director, length) Find movies whose title appears more than once. correlation SELECT DISTINCT title FROM Movie AS x WHERE year <> ANY (SELECT year FROM Movie WHERE title = x. title); Note (1) scope of variables (2) this can still be expressed as single SFW
Simple Aggregations Purchase Product Date Price Quantity Bagel 10/21 1 20 Banana 10/3 0. 5 10 Banana 10/10 1 10 Bagel 10/25 1. 50 20 SELECT Sum(price * quantity) FROM Purchase WHERE product = ‘bagel’ 50 (= 20+30)
Grouping and Aggregation Purchase(product, date, price, quantity) Find total sales after 10/1/2005 per product. SELECT product, Sum(price*quantity) AS Total. Sales FROM Purchase WHERE date > ‘ 10/1/2005’ GROUP BY product Let’s see what this means…
Grouping and Aggregation 1. Compute the FROM and WHERE clauses. 2. Group by the attributes in the GROUPBY 3. Compute the SELECT clause: grouped attributes and aggregates.
1&2. FROM-WHERE-GROUPBY Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1. 50 20 Banana 10/3 0. 5 10 Banana 10/10 1 10
3. SELECT Product Date Price Quantity Bagel 10/21 1 20 Bagel 10/25 1. 50 20 Banana 10/3 0. 5 10 Banana 10/10 1 10 SELECT product, Sum(price*quantity) AS Total. Sales FROM Purchase WHERE date > ‘ 10/1/2005’ GROUP BY product Product Total. Sales Bagel 50 Banana 15
NULLS in SQL • Whenever we don’t have a value, we can put a NULL • Can mean many things: – – Value does not exists Value exists but is unknown Value not applicable Etc. • The schema specifies for each attribute if can be null (nullable attribute) or not
Null Values • If x= NULL then x=“Joe” is UNKNOWN • In SQL there are three boolean values: FALSE = 0 UNKNOWN = 0. 5 TRUE = 1
Null Values Unexpected behavior: SELECT * FROM Person WHERE age < 25 OR age >= 25 Some Persons are not included !
Null Values Can test for NULL explicitly: – x IS NULL – x IS NOT NULL SELECT * FROM Person WHERE age < 25 OR age >= 25 OR age IS NULL Now it includes all Persons
Product Purchase Name Category Prod. Name Store Gizmo gadget Gizmo Wiz Camera Photo Camera Ritz One. Click Photo Camera Wiz Name Store Gizmo Wiz Camera Ritz Camera Wiz One. Click NULL
Modifying the Database Three kinds of modifications • Insertions • Deletions • Updates Sometimes they are all called “updates”
Insertions Example: Insert a new purchase to the database: INSERT INTO Purchase(buyer, seller, product, store) VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’, ‘The Sharper Image’) Purchase Buyer Seller Product Store Joe Fred wakeup-clock- The Sharper espresso. Image machine
Insertions INSERT INTO PRODUCT(name) SELECT DISTINCT Purchase. product FROM Purchase WHERE Purchase. date > “ 25/10/17” The query replaces the VALUES keyword. Here we insert many tuples into PRODUCT
Product Purchase Name Category Prod. Nam e Date Gizmo gadget Gizmo 10/9/17 One. Click Photo Camera 28/11/17 Camera 05/05/16 Camera INSERT INTO PRODUCT(name) SELECT DISTINCT Purchase. product FROM Purchase WHERE Purchase. date > “ 25/10/17” The query replaces the VALUES keyword. Here we insert many tuples into PRODUCT 85
Insertion: an Example Product(name, list. Price, category) Purchase(prod. Name, buyer. Name, price) Suppose database got corrupted and we need to fix it: Purchase Product name list. Price category gizmo 100 gadgets prod. Name buyer. Name price camera John 200 gizmo Smith 80 camera Smith 225 Task: insert in Product all prod. Names from Purchase
Insertion: an Example INSERT INTO Product(name) SELECT DISTINCT prod. Name FROM Purchase WHERE prod. Name NOT IN (SELECT name FROM Product) name list. Price category gizmo 100 Gadgets camera - -
Insertion: an Example INSERT INTO Product(name, list. Price) SELECT DISTINCT prod. Name, price FROM Purchase WHERE prod. Name NOT IN (SELECT name FROM Product) name list. Price category gizmo 100 Gadgets camera 200 - camera 225 -
Deletions Example: DELETE FROM PURCHASE WHERE seller = ‘Joe’ AND product = ‘Brooklyn Bridge’ Factoid about SQL: there is no way to delete only a single occurrence of a tuple that appears twice in a relation.
Updates Example: UPDATE PRODUCT SET price = price/2 WHERE Product. name IN (SELECT product FROM Purchase WHERE Date =‘Oct, 25, 1999’);
Introduction to SQL (continued) 91
Introduction to SQL (continued) 92
Introduction to SQL (continued) 93
Table Basics A Table is an object Database data is stored in Tables Each table has a unique name Columns have various attributes, such as column name and data type • Rows contain records or data for the columns • •
Weather Sample Table City State High Low Phoenix Arizona 105 90 Tuscon Arizona 101 92 Flagstaff Arizona 88 69 San Diego California 77 60 Albuquerque New Mexico 80 60
Conditions Used In Where Clause = > < >= <= <> equals greater than less than greater than or equal to less than or equal to not equal to
Creating Tables The statement to use is create table Here is the syntax: create table “tablename” (“columnname”, “datatype”, “columnname 2”, “datatype”, “columnname 3”, “datatype”);
Creating Tables cont’d Here is a real example: create table employee (first varchar(15), last varchar(20), age number(3), address varchar(30), city varchar(20), state varchar(20));
Creating Tables - Steps 1. Use the command create table 2. Follow it with the correct table name 3. Put a parenthesis and type in the first column name 4. Follow it with the variable type (we will list them in a minute) then a comma 5. Continue the previous two steps until you have all your columns accounted for 6. Then put a parenthesis to close the columnname section and add a ; after it
Creating Tables - Rules • Table and column names must start with a letter • They can not exceed 30 characters • They can not be key words such as create, insert, select, etc.
Creating Tables – Variables • If you took algebra then y=2 x might be familiar. y and x are unknown information, which is a variable. • Now a string is a bunch of letters and numbers • A number is a bunch of numbers • A data type determines what a variable can hold, i. e. strings or numbers
Creating Tables – Data Types • varchar(size) - all column entries must be less than or equal to whatever size is, if size is 10, then the string must be between 1 -10 characters
Creating Tables – Data Types cont’d • number(size) - a number value that can not exceed, size columns, for example if you have size = 10, then you can only have 10 different digit places, like 1, 000, 000 • date - date value • number(size, d) - This works the same as the regular number except d represents # of columns after the decimal.
Creating Tables - Constraints A constraint is a rule. Some examples constraints are: • unique - no two entries will be the same • not null - no entry can be blank • **primary key - unique identification of each row** • primary keys will be very important to you as your knowledge of databases progresses
Inserting Information into Tables Here is a practical example: insert into employees (first, last, age, address, city, state) values ( 'Luke', 'Duke', 45, '2130 Boars Nest', 'Hazard Co', 'Georgia');
Inserting Information into Tables Steps **All strings should be enclosed by single quotes: 'string'** 1. Use the keyword "insert into" followed by the tablename 2. Then on the next line, in parenthesis, list all the columns you are inserting values for. 3. Then on the line after, type values, then in parenthesis, put the values in the same order as the columns they belong to
Updating Records cont’d Here are some practical examples: ex) update phone_book set area_code = 623 where prefix = 979; This changes the area code all numbers beginning with 979 to 623
Updating Records cont’d update phone_book set last_name = 'Smith', prefix=555, sufix=9292 where last_name = 'Jones'; This changes everyone whose last name is Jones to Smith and their number to 555 -9292
Deleting Records Examples ex) delete from employee; deletes all records from that table ex) delete from employee where lastname='May'; deletes all records for people whose last name is May ex) delete from employee where firstname='Mike' or firstname='Eric'; deletes all records for anyone whose first name is Mike or Eric
Deleting Tables Use the drop command drop table "tablename"; drop table employees; Bye Table, Hello Corporate Espoinage =)
Concurrency Control 111
What is Concurrency Control Process of managing simultaneous execution of transactions in a shared database, to ensure the serializability of transactions, is known as concurrency control. Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system. Provides database systems the ability to handle many users accessing data simultaneously 112
113
114
115
Transaction States and Operations ( contd) 116
Interleaving – single CPU 117
Simultaneous Concurrency- multiple CPUS 118
False Concurrency 119
Read and Write Operations 120
Read item 121
Write Item 122
Concurrency Problems 123
124
125
T 1 = X = 5; 10 NOT COMMITED T 2 = X=10 ( incorrect) 126
The uncommitted dependency problem Transaction A time Transaction B t 1 RETRIEVE p UPDATE p t 2 t 3 ROLLBACK
128
The inconsistent analysis problem
130
131
132
133
To be restarted later as it violates serializability insufficient funds division by zero, programming errors control c by user, back button, etc contd
135
136
137
138
139
140
141
142
What is Lock? A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it. Generally, there is one lock for each data item in the database. Locks are used as a means of synchronizing the access by concurrent transactions to the database item. 143
144
Binary Locks At most one transaction can hold the lock on a particular item. Thus no two transactions can access the' same item concurrently. Disadvantages of Binary Locks As discussed earlier, binary locking scheme is too restrictive for database items, because at most one transaction can hold a lock on a given item. So, binary locking system cannot be used for practical purpose. 145
146
Share/Exclusive (for Read/Write) Locks We should allow several transactions to access the same item A if they all access A' for reading purposes only. However, if a transaction is to write an item A, it must have exclusive access to A. For this purpose, a different type of lock called a multiple-mode lock is used. In this scheme there are shared/exclusive or read/write locks are used. 147
Shared/Exclusive Locks Locking operations There are three locking operations : read_lock(A) - lock-S(A) [S shared lock] write_lock(A) - lock-X(A) [X exclusive lock] unlock(A) -unlock(A) A read-locked item is also called share-locked item because other transactions are allowed to read the item, whereas a write-locked item is caused exclusive-locked, because a single transaction exclusively holds the lock on the item. 148
149
150
151
152
Deadlock 153
154
155
156
157
158
DATA WAREHOUSING AND DATA MART 159
0. Introduction 160
A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom-otions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? 161
Data, Data everywhere yet. . . z I can’t find the data I need y data is scattered over the network y many versions, subtle differences z I can’t get the data I need y need an expert to get the data z I can’t understand the data I found y available data poorly documented z I can’t use the data I found y results are unexpected y data needs to be transformed 162 from one form to other
What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand use in a business context. [Barry Devlin] 163
What is Data Warehousing? Information Data A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] 164
DEFINITION DATAWAREHOUSE z. A Data Warehouse (DW) is defined as “a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process” 165
Evolution of Data Warehousing z 60’s: Batch reports y hard to find analyze information y inflexible and expensive, reprogram every new request z 70’s: Terminal-based DSS and EIS (executive information systems) y still inflexible, not integrated with desktop tools z 80’s: Desktop data access and analysis tools y query tools, spreadsheets, GUIs y easier to use, but only access operational databases z 90’s: Data warehousing with integrated OLAP engines and tools 166
Very Large Data Bases z Terabytes -- 10^12 bytes: Walmart -- 24 Terabytes z Petabytes -- 10^15 bytes: Geographic Information Systems z Exabytes -- 10^18 bytes: National Medical Records z Zettabytes -- 10^21 bytes: Weather images z Yottabytes -- 10^24 bytes: Intelligence Agency Videos 167
Data Warehousing -It is a process z Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previously possible z A decision support database maintained separately from the organization’s operational database 168
Data Warehouse z A data warehouse is a ysubject-oriented yintegrated ytime-varying ynon-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996 169
Data Warehouse Architecture Relational Databases Optimized Loader ERP Systems Extraction Cleansing Data Warehouse Engine Purchased Data Legacy Data Analyze Query Metadata Repository 170
Data Warehouse for Decision Support & OLAP z Putting Information technology to help the knowledge worker make faster and better decisions y. Which of my customers are most likely to go to the competition? y. What product promotions have the biggest impact on revenue? y. How did the share price of software companies correlate with profits over last 10 years? 171
Data Mining works with Warehouse Data z Data Warehousing provides the Enterprise with a memory z Data Mining provides the Enterprise with intelligence 172
We want to know. . . z Given a database of 100, 000 names, which persons are the least likely to default on their credit cards? z Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? z If I raise the price of my product by Rs. 2, what is the effect on my ROI? z If I offer only 2, 500 airline miles as an incentive to purchase rather than 5, 000, how many lost responses will result? z If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? z Which of my customers are likely to be the most loyal? Data Mining helps extract such information 173
Application-Orientation vs. Subject -Orientation Application-Orientation Subject-Orientation Operational Database Loans Credit Card Data Warehouse Customer Vendor Trust Savings Product Activity 174
OLTP( Online Transaction Processing) vs Data Warehouse z OLTP y. Application Oriented y. Used to run business y. Detailed data y. Current up to date y. Isolated Data y. Repetitive access y. Clerical User z Warehouse (DSS) y. Subject Oriented y. Used to analyze business y. Summarized and refined y. Snapshot data y. Integrated Data y. Ad-hoc access y. Knowledge User (Manager) 175
OLTP vs Data Warehouse z OLTP y Performance Sensitive y Few Records accessed at a time (tens) y Read/Update Access y No data redundancy y Database Size 100 MB -100 GB z Data Warehouse y Performance relaxed y Large volumes accessed at a time(millions) y Mostly Read (Batch Update) y Redundancy present y Database Size 100 GB - few terabytes or more 176
To summarize. . . z OLTP Systems are used to “run” a business z The Data Warehouse helps to “optimize” the business 177
Data Warehouse Architecture Relational Databases Data Mining tools ERP Systems Extraction & Loading Data Warehouse Engine Purchased Data Legacy Data Analyze Query Metadata Repository 178
Components of the Warehouse z. Data Extraction and Loading z. The Warehouse z. Analyze and Query -- OLAP Tools( Online Analytical Processing) z. Metadata z. Data Mining tools 179
Loading the Warehouse Cleaning the data before it is loaded 180
Data Quality - The Reality z. A data warehouse is not simply extracting operational data and entering into a data warehouse z. Warehouse data comes from different questionable sources 181
Data Quality - The Reality z Legacy systems no longer documented z Outside sources with questionable quality procedures z Production systems with no built in integrity checks and no integration y. Operational systems are usually designed to solve a specific business problem and are rarely developed to a corporate plan x“And get it done quickly, we do not have time to worry about corporate standards. . . ” 182
Data Integration Across Sources Savings Same data different name Loans Different data Same name Trust Data found here nowhere else Credit card Different keys same data 183
Data Transformation Example encoding appl A - m, f B - 1, 0 C - x, y D - male, female unit appl A - pipeline - cm B - pipeline - in C - pipeline - feet D - pipeline - yds field Data Warehouse appl A - balance B - bal C - currbal D - balcurr 184
Data Integrity Problems z Same person, different spellings y. Agarwal, Agrawal, Aggarwal etc. . . z Multiple ways to denote company name y. Persistent Systems, PSPL, Persistent Pvt. LTD. z Use of different names ymumbai, bombay z Different account numbers generated by different applications for the same customer z Required fields left blank z Invalid product codes collected at point of sale ymanual entry leads to mistakes 185 y“in case of a problem use 9999999”
Loads z. After extracting, scrubbing, cleaning, validating etc. need to load the data into the warehouse z. Issues y huge volumes of data to be loaded y small time window available when warehouse can be taken off line (usually nights) y when to build index and summary tables y allow system administrators to monitor, cancel, resume, change load rates y Recover gracefully -- restart after failure from where you were and without loss of data integrity 186
Load Techniques z. Use SQL to append or insert new data yrecord at a time interface ywill lead to random disk I/O’s z. Use batch load utility 187
Load Taxonomy z. Incremental versus Full loads z. Online versus Offline loads 188
Refresh z. Propagate updates on source data to the warehouse z. Issues: ywhen to refresh yhow to refresh -- refresh techniques 189
When to Refresh? z periodically (e. g. , every night, every week) or after significant events z on every update: not warranted unless warehouse data require current data (up to the minute stock quotes) z refresh policy set by administrator based on user needs and traffic z possibly different policies for different sources 190
Data Warehouse vs. Data Marts What comes first 191
From the Data Warehouse to Data Marts Information Less Individually Structured History Normalized Detailed Departmentally Structured Organizationally Structured Data Warehouse More Data 192
Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data 193
Characteristics of the Departmental Data Mart z OLAP z Small z Flexible z Customized by Department z Source is departmentally structured data warehouse 194
True Warehouse Data Sources Data Warehouse Data Marts 195
Data Warehouse Pitfalls z time extracting, cleaning, and loading data z Data warehousing project scope will increase z problems with systems feeding the data warehouse z need to store data not now captured z validate data z Lack of detail by TPS z End users not trained 196
Data Warehouse Pitfalls z High demand for IS written reports z conflicting business rules for reports z Data homogenizing z Security issues z High maintenance z Focus on customer value maximization and not internal reports 197
198
https: //www. cs. uct. ac. za/mit_notes/ database/htmls/chp 13. html#use-ofthe-locking-scheme 199
- Slides: 199