DB Chapter 1 Sayed Milad Waizi Information Vs

DB – Chapter 1 Sayed Milad Waizi

Information Vs Data can be described as unprocessed facts and figures. Plain collected data as raw facts cannot help in decision-making. However, data is the raw material that is organized, structured, and interpreted to create useful information systems.

Information Vs Data Information is a data that has been processed into a form that is meaningful to recipient Data Processing information

What is database A database is a collection of large amount of data that is organized so that it can easily be accessed, managed, and updated.

Basic Elements of Database Data Information

Information Cycle Input (Data) Data Captured Results Process Output (Information) Database Action Information Cycle Decision

How to manage data Traditional File-based approach for managing the data : Traditional file based system is basically a file based system, in which we manually or through computer handle the database such as updating, insertion , deletion adding new files to database etc.

File-based Systems Advantages: Ø No need of external storage Ø No need of highly technical person to handle the database. Ø Processing speed is high as compare to dbms.

Traditional File-based Systems Limitations: Ø Separation and isolation of data Ø Data redundancy and inconsistency Ø Concurrent access anomalies Ø Integrity problem Ø Security problems Ø Incompatible File Formats

Database Systems Databases are used to store data in an efficient and organized manner. A database allows quick and easy management of data. At any point of time, data can be retrieved from the database, added, and searched based on some criteria in these databases.

Database Systems Advantages: The amount of redundancy in the stored data can be reduced No more inconsistencies in data The stored data can be shared Standards can be set and followed Data Integrity can be maintained Security of data can be implemented

Database Management System(DBMS) The DBMS is the software that interacts with user application program and database.

Database Management System(DBMS) Disadvantages: Complexity Cost of DBMS Cost of conversion Performance Higher impact of a failure

Components of DBMS Environment Hardware Software Data Procedure People

Components of DBMS Environment Hardware Can range from a PC to a network of computers. Software DBMS, operating system, network software (if necessary) and also the application programs. Data Used by the organization and a description of this data called the schema

Components of DBMS Environment Procedures Instructions and rules that should be applied to the design and use of the database and DBMS. People Includes database designers, DBAs, application programmers, and end-users.

RDBMS A type of database Management System (DBMS) that sores data in the form of related tables. The term "relational database" was invented by E. F. Codd at IBM in 1970.

Normalization In relational database design, the process of organizing data in such a way so that we can minimize the redundancy. Normalization usually involves dividing a database into two or more tables without loss of data and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

Major problems with Data Redundancy Insertion Anomalies Deletion Anomalies Modification Anomalies

Data Redundancy • Problems associated with data redundancy are illustrated by comparing the Staff and Branch relations with the Staff. Branch relation. Staff. Branch

Insertion Anomalies § § To insert the details of a new member of staff located at branch B 1 into the Tbl_Staff_Branch Table shown above, we must enter the correct details of branch numner B 1 so that the branch details are consistent with the values for branch B 1 in other rows. To insert the details of a new branch that currently has no members of staff into the Tbl_Staff_Branch table, it is necessory to enter nulls for the staff details which is not allowed as staff. ID is the primary key.

Deletion Anomalies If we delete a row from the Tbl_Staff_Branch table that represents the last member of staff located at that branch, (for e. g. row with Branch numbers B", B 3 or B 4) the detals about that branch are also lost from the Database.

Modification Anomalies Ø Should we need to change the address of a perticular branch in the Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this modification is not carried out on all the relevent rows, the database will become inconsistent

Normalization E. F. Codd, who was the IBM researcher credited with the creation and evolution of the relational database, set forth a set of rules that define how data should be organized in a relational database. Initially, he proposed three sequential forms to classify data in a database: First normal form (1 NF), Second normal form (2 NF), and Third normal form (3 NF). After these initial normal forms were developed, research indicated that they could result in update anomalies, so three additional forms were developed to deal with these issues: fourth normal form (4 NF), fifth normal form (5 NF), and the Boyce-Codd normal form (BCNF).

Normalization Process UNF Remove repeating groups 1 NF Remove partial dependencies 2 NF Remove transitive dependencies 3 NF Remove remaining dependencies BCNF

Unnormalized Form A table would be in unnormalized form if it contains some multivalued attributes or repeating groups. Example: Emp_Code Name Province Email Mobile Bu-001 Ahmad Kabul A@yahoo. com 0777132211, 0799 123432 Bu-002 Khalid Parwan K@yahoo. com 0778123457 Bu-003 Jawid Balkh J@yahoo. com 0700421432

First Normal Form A table is considered to be in 1 NF if all the fields contain only scalar values (as opposed to list of values). Emp_Code Name Province Email Mobile Bu-001 Ahmad Kabul A@yahoo. com 0777132211 Bu-002 Khalid Parwan K@yahoo. com 0778123457 Bu-003 Jawid Balkh J@yahoo. com 0700421432

Second Normal Form Before we learn about the second normal form, we need to understand the following: Prime attribute − An attribute, which is a part of the prime-key, is known as a prime attribute. Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

Second Normal Form If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute.

Second Normal Form We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i. e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

Second Normal Form We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

Second Normal Form A table or database is in 2 NF if: The database is in first normal form No partial dependency means no attribute should be dependent on any portion of the primary key.

Third Normal Form 3 NF is based on the concept of transitive dependency. Transitive Dependency is a condition where A, B and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or C).

Third Normal Form EMP_DEPT Ename SSN Bdate Address Dnumber Dname DMGRSSN Transitive FD Get Rid of Transitive FD Dnumber Ename SSN Bdate Address Dnumber EMPLOYEE Dname DMGRSSN DEPARTMENT

Third Normal Form A table or database is in 3 NF if: It is in 2 NF and It contains no transitive dependencies

BCNF Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form and Sometimes is BCNF is also referred as 3. 5 Normal Form. A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. This is an improved form of third normal form.

BCNF If the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGY

BCNF we cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor In BCNF we have two tables:

BCNF

Fourth Normal Form A table is in 4 th normal form if: it is in Boyce-Codd normal form it does not contain more than one multivalued dependency

Fourth Normal Form Example: Suppose that employees can be assigned to multiple projects. Also suppose that employees can have multiple job skills as shown in database. Try to normalize the" database.

Fourth Normal Form This database shows that Project_No and Skill are independent multi-valued facts about Emp_No that is it contains "a multi-valued dependency.

Fourth Normal Form This form of the table is obviously full of anomalies. There is a high degree of redundancy that will lead to update problems. Since the database contains MVDs, so it should be decomposed with the help of rule of fourth normal form. Here, the database contain the following MVDs: Emp_No --> Project_No Emp_No--> Skill

Fourth Normal Form Here, Project_No and Skill are independent to each other, so it should be decomposed in to following database according to forth normal form.

Fifth Normal Form Fifth normal form is satisfied when all tables are broken into as many tables as possible in order to avoid redundancy. Once it is in fifth normal form it cannot be broken into smaller relations without changing the facts or the meaning.

The END ? ? ?
- Slides: 46