Teaching slides Chapter 8 Chapter 8 Database design

Teaching slides Chapter 8

Chapter 8 Database design & construction Contents • • • Introduction Databases and software engineering methodologies Database types Database languages Database entities – Schema, Tables, Primary keys, Foreign keys, Indexes, Sequences, Stored procedures, Triggers • Database design – Entity-relationship diagram, Normalization, Relationship among tables, Integrity rules, Master and transaction data • Database management systems • Object relational databases

Chapter 8 Database design & construction Introduction Databases are integral part of any software product. Most software products need to create and keep permanent data. This permanent data is created and kept in databases. Databases are extremely important for many types of software products. The power of search engines like Google come from their sophisticated and huge databases. Most commercial and enterprise software products (ERPs, SAS, Accounting etc. ) depend on their databases for all business transactions and reports. Even social media software products like Facebook, Twitter etc. have databases as their backbone. There are 2 parts of any database: the database engine and the stored data. The database engine itself is a software product. Many software companies build database engines. Oracle database engine is created by Oracle Corporation, MS SQL Server is created by Microsoft Corporation etc. Software companies who build software products use one of the database engines to create and store data which is used by their software products.

Chapter 8 Database design & construction No. SQL databases There are many types of databases. Most databases fall under the category of either No. SQL databases or relational databases. No. SQL databases are essentially a file containing data. The data in these No. SQL databases is kept in a structured format. For example we can store data in an XML file. The advantage of No. SQL databases is that they can be stored alongside other kinds of files. For example an XML file can be kept on a web server alongside the HTML files. Reading and writing from these XML files is fast as compared to a relational database as they reside near the files which need data from these files. For webbased software products, use of XML files thus can be used to write client side scripts and perform user validations. The disadvantages of No. SQL databases is that it is difficult to enforce integrity rules to keep atomic and non-repeating data. When a data file becomes too large then reading or writing in data files becomes very slow.

Chapter 8 Database design & construction Relational databases are the most often type of databases used with software products. Relational databases are powerful and can grow to contain extremely large size of data. The main idea behind relational databases is that data can be kept in different tables. Later a relation can be set among all these tables so that it is possible to create data and keep this data in different tables. Whenever required to view or modify data, a search through all the connected tables is possible. The ability to separate data in tables and yet be able to connect all this data through a relation between tables is something which makes relational databases powerful.

Chapter 8 Database design & construction Structured Query Language (SQL) Relational databases use Structured Query Language (SQL) to define database entities as well as do all kinds of data manipulation. The part of language which deals with database entities creation and management is known as Data Definition Language (DDL). The part of language which deals with data creation and management is known as Data Manipulation Language (DML). Both DDL and DML use SQL. Database entities include database schemas, database tables, sequences, indexes, primary keys, foreign keys, stored procedures, triggers etc. We will discuss about relational databases, entities and how they work throughout this chapter.

Chapter 8 Database design & construction Database entities: database schema Schema is the top level database entity. Generally you define one schema for all database entities for a software product. For example you can have a schema “ABCbank. DB” for a software product for ABC Bank. If you have one more software product then you can create another database schema for this product. All the database entities for this software product you can create inside this second schema. This way, all database entities belonging to this software will be separate from the database entities you created for ABC Bank under “ABCbank. DB” schema. Using DDL a schema can be created or dropped. Create schema <schema_name>

Chapter 8 Database design & construction Table definition for Account_transaction

Chapter 8 Database design & construction Database table Account_transaction with data

Chapter 8 Database design & construction A database table is an entity which holds data. This data is kept in columns. Each table has many rows of data. Each row of data is known as a record. The above table depicts a database table with many rows or records in many columns. There could be many tables in a database. These tables can be joined so that related data can be created in all the tables which are joined with each other. When a search is performed to view or modify data then it is possible to find out the exact data looked for and based on that piece of data, view or modification in existing data can be performed. Using DDL a table can be created, altered or dropped. CREATE TABLE <Table_Name> (<Column_Name 1> datatype 1, <Column_Name 2> datatype 2, . . . ) PRIMARY KEY (<Column_Name 1>) Using DML insert, update, delete or view operations can be done in a table. Insert into table_name (column_name 1, column_name 2, , , ) values (datatype value 1, datatype value 2. . ); Delete from table_name where <condition>; Update table_name set column_name = “some value” where <condition>; Select from table_name where <condition>;

Chapter 8 Database design & construction Primary key To join 2 tables, a key is created on a column in one of the tables. Another key is created on a column in the other table. Using these 2 keys, the tables are joined. The first key is known as primary key and the second key is known as foreign key. The primary key ensures that whatever data being created in the column on which the primary key is set is unique and there is no NULL values in this column. A primary key can also be set on a combination of more than one column. To ensure that a good primary key is made for a table, some considerations include: • The data in the column with primary key should not be changeable. • The data in the column with primary key should be simple. • Fact-less data is better for columns with primary key. Factual data has problems of repeating or change. Auto increment numbers are best way to generate and populate data for columns with primary key. In most databases, sequences are used to create auto incrementing numbers.

Chapter 8 Database design & construction Foreign key is placed on a column in a table so that the data created in this column is dependent on data which is created in the column with primary key in the other table which is joined with this table. If a piece of data in a column with foreign key is not present in the corresponding column with primary key in the other table then the relationship between these 2 tables will no longer exist. So all data belonging to the column with foreign key must already exist in the column with primary key in the other table.

Chapter 8 Database design & construction A bad join between Account and Account_transaction tables

Chapter 8 Database design & construction We know the power of relational databases come from keeping data separate in different tables and yet all this data is related to each other through table joins. Good table joins are extremely important as this will facilitate building a large database and still all these pieces of data will be atomic, pure and will be easily available when required to view or do any modification. Reading or writing these pieces of data will also be fast as the database engine will have to work less in doing this work. It is because the primary key will be consisting of a simple number and doing search, insert and other database operations will be easy and fast.

Chapter 8 Database design & construction Bad table joins on the other hand will lead to repeated, impure and slow performing data. One example of bad table join is provided in the above figure. The table join is made between Account and Account_transaction tables using columns Account_number and Account_name. Account number for any bank account is normally unique and is never NULL. But it can be changed in cases where the number of bank customers have increased and the previous numbers used for creating new account numbers no longer fit in the number of digits used in creating an account number. In this case account number of all existing bank account holders may need to use larger number of digits and thus need to be changed. The column Account_number has also be included in the primary key. This field will contain many duplicate entries as customer names can not be unique. Even though with combination of Account_number a primary key data will be unique, still the fact that the data in this column will be strings. It is a bad idea to make searches and inserts of strings as these operations for strings data types will always be slow. There is also a violation of table join rules as all entries for Account_number and Account_name are being duplicated in both the tables.

Chapter 8 Database design & construction A good table join between Account and Account_transaction

Chapter 8 Database design & construction Good table joins are the key to build reliable databases with fast operations. In the above figure there is a table join between Account and Account_transaction tables. This table join is a good one because of many reasons. • The primary key selected for the join is the Account_ID column of the Account table. This field is auto generated. So data in this column will be simple and follow a fixed pattern. • This field is fact-less. There is no possibility of any changes in data. • This will be a numeric field. Searches and inserts in this column will be faster. • This is a single column. Searches and inserts in this column will be faster. • No duplication of data in the Account_transaction table. Only the Account_ID column will have data which is same as that in the Account_ID column in the Account table. This is because Account_ID column in the Account_transaction table is the foreign key. All other columns in this table have data which is not there in the Account table.

Chapter 8 Database design & construction Indexes are used to do speed search for data in databases. Indexes for databases are same as indexes for books. You can search for a keyword in the book by finding the page number from its index. When an index is put on a column in a table then a keyword list is generated which keeps information about data in other columns. This makes searches inside databases extremely fast. Indexes take their own memory space. Whenever some new data is created, deleted or modified in a table which has an index created then this index will need to be rebuilt. If a table has too many transactions then for each transaction, the index will be rebuilt. This rebuilding of index will consume much of memory of the computer and transactions may become very slow. So indexes should not be created on tables which have many transactions going on. Tables which are used for reporting are the best places where indexes are most useful. Reporting will become fast if indexes are put on these tables. These tables are also not transaction oriented and so indexes do not consume computer memory much.

Chapter 8 Database design & construction Sequences & Stored procedures Sequences in databases are used to generate auto increment numbers to be used to populate columns with unique data. Since numbers generated by sequences are fact-less, unique and do not contain NULL values; they are used to populate columns of tables with primary keys. Stored procedures are pieces of software programs which can be saved in the database. When these programs are run they perform database operations and perform required computation. Stored procedures run very fast as they reside near to the database entities.

Chapter 8 Database design & construction CREATE PROCEDURE Account_insert @Account_ID INT, @Account_number VARCHAR(6), @Account_name VARCHAR(15), AS BEGIN INSERT INTO Account ( Account_ID, Account_number, Account_name, ) VALUES ( @Account_ID, @Account_number, @Account_name, ) END In this stored procedure example a stored procedure name Account_insert is created with 3 variables. In the database table Account, an insert operation is done using the values stored in these variables.

Chapter 8 Database design & construction Triggers are very similar to stored procedures as they are software programs which are saved in the database. Triggers are always associated with an event. When this event occurs then the trigger gets activated and is executed. For example, if a delete operation is performed on a table and a trigger is defined for this operation the trigger will fire and the programming code written in the trigger is executed. Triggers are not used much. Triggers are difficult to control. Triggers can cause corruption in data as controlling them is difficult.

Chapter 8 Database design & construction Entity relationship diagrams

Chapter 8 Database design & construction Entity relationship diagrams (ER diagrams) are used to design databases. To design a database, the first thing to find out is the entities which will either generate or consume data. Once entities are identified then you need to find out attributes of those entities. Next you need to find out the relationship among those entities. These 3 things are depicted in any ER diagram. If you need to create a database for a book store then the entities can be books, stores and authors. Each book can have attributes like author name, title, price, subject etc. Similarly other entities will have their own attributes. The relationship among entities can be thought out by finding what kind of data they will generate or consume and from whom? For example a book can be written by one or many authors. At the same time, an author can write many books. This means that books and authors have a many to many relationship. We can have just one store but this store can have many books. Here we can see that a one to many relationship exists.

Chapter 8 Database design & construction Database table which is not normalized

Chapter 8 Database design & construction Normalization A relational database should contain atomic and unique data so that it becomes useful to a software product. ER diagrams help in finding out entities, their attributes and relationship among these entities. Using ER diagrams we can design database tables and their relationships. The next level of design aspect for databases comes when we think about quality of data which will be created and saved inside the database. Normalization is the process of finding out if the data which will be created will be atomic and unique. Any table in a database can be in any of the 3 normal forms: first normal form, second normal form or third normal form. This fact depends on the kind of data present in the table. • If created data is stored in a database table where there is no list of data in any of the columns in the table. All data in the table is thus atomic and this table is in first normalization form. • If created data is in first normal form and all data in most of the columns are dependent on the data in the primary key column the table is in second normal form. • If created data is in second normal form and if no data in any column is dependent on data in any other column; apart from the primary or secondary key column in any table then this table is in third normal form. In the above figure a database table with data is depicted which is not normalized. You can observe that in Debit/Credit column, there are many pieces of data in many rows (records).

Chapter 8 Database design & construction One to many and many to many table joins

Chapter 8 Database design & construction Two figures are presented above. The top figure depicts one to many relationship between a store and books it is selling. The store has attributes like store_ID, address, phone, email etc. The book has attributes like title, ISBN, book_ID, store_ID etc. The 2 tables are joined on the store_ID fields. The store table has store_ID as primary key and the book table has store_ID as foreign key. The relationship between books and authors is many to many. To create many to many relationship between 2 tables, an intermediate table is created. One to many relationship is created between one table and this intermediate table. Another one to many relationship is created between this intermediate table with the other table. For this purpose an intermediate table “book details” have been created. Now we can have a many to many relationship between books and author table by having one to many relationship between books and “book details” table and one to many relationship between author and “book details” table.

Chapter 8 Database design & construction Recursive relationship or self table join

Chapter 8 Database design & construction Sometimes a recursive relationship happens with an entity. In this case a subset of the entity has a relationship with the entity itself. For example a manager is an employee of a company. A manager has many employees under him/her. So a relationship happens between an employee and a manager where the manager him/herself is also an employee. When a table join is depicted for recursive relationship then it is also known as self join. In the above 2 figures, a table self join is shown as well as the data after table join.

Chapter 8 Database design & construction One to one relationship between 2 tables

Chapter 8 Database design & construction Most common relationship between 2 tables is one to many. Many to many relationship between 2 tables is also not uncommon. Sometimes we need a one to one relationship between 2 tables. For example if we have defined roles for people in a company and these roles are used to create role specific software product features. Some roles are sensitive and people in these roles perform some sensitive work. In this case these sensitive roles and data related to these roles should be kept separate and secure. In this case 2 tables should be created: one containing common roles and the other containing sensitive roles. A one to one relationship should be established between these 2 tables.

Chapter 8 Database design & construction We have seen how to create a join among database tables using primary and foreign keys. These keys are governed by integrity rules. The primary key is governed by entity integrity rule and the foreign key is governed by referential integrity rules. The entity integrity rule states that a primary key column should not contain any NULL values. If it does then a relationship with a foreign key column can not be maintained. The row (record) in which a primary key column contains a NULL value can not be uniquely identified. The referential integrity rule states that data in the foreign key column must match with data in the primary key column. If any data in the foreign key column does not match with data in the primary key column the relationship between these 2 tables is not valid.

Chapter 8 Database design & construction Master and transaction data One important aspect about database design is the master and transaction data. Master data is about entities and their attributes. In contrast transaction data is about transactions which happen on a daily basis. Once master data is defined then it seldom changes. Quantity of data used for populating entities and their attributes is small and it may increase only when new entities or their attributes need to be incorporated in existing tables. Transaction data on the other hand keeps increasing on daily basis. Volume of data thus always tend to be very large. For example a software product for payroll will have transaction data about salaries and other payments given to employees over time. Each week or month (when salaries are given) there will be large amount of data added in the database about these given salaries and other payments to employees. When designing database tables, aspects about master and transaction data must be taken care of. Otherwise redundant data may enter into database. For example if master data creeps into transaction data then all this master data will be duplicate entries as they will also be there in the master tables as well. All master data should always be contained inside master tables. Master data should be linked to transaction data through table joins.

Chapter 8 Database design & construction Database Management Systems (DBMS) are specialized software products which rely heavily in databases. Some examples of DBMS systems include payroll management systems, Electronic Data Processing (EDP) systems, financial account systems, manufacturing & warehouse accounting systems etc. These systems are characterized by using user interfaces which are meant to do some transaction and save the data created during these transactions. A lot of transaction data is created by these systems. There also reporting user interfaces which are used by managers to get various reports. Since pure data is important for these systems, databases need to be very secure.

Chapter 8 Database design & construction A class structure and a corresponding table structure

Chapter 8 Database design & construction Object relational databases are a combination of object oriented software design and relational databases. We need to create a design for the software product. We also need to create a design for the database. Data flow in the system happens to and from user interface to business logic layer to the database. The design for business logic is generally not very different from the design for a database. After all the same data passes through the business logic and to the database. Off course some processing happens in the business logic. But it is possible to design the database from the design of the business logic. Thus instead of creating 2 designs (one each for the business logic and the database), it is possible to create just one design. The other design can be derived automatically from the first one. Object relational databases work on this principle. Some companies like Oracle Corporation, Microsoft Corporation etc. have been doing research work to build object relational databases.