MT 858 Advanced Database Systems 1 Agenda Introduction
MT 858 Advanced Database Systems 1
Agenda Introduction TMA M 876 Block 1 2
Tutor Name : William C. M. Pang Mobile : 91455951 Email : n t 432908@study. ouhk. edu. hk Time for Telephone Tutoring n Mon & Tue 20: 00 - 22: 00 3
Course Materials M 876 Relational Database System n Block 1 Information Systems n Block 2 Relational Theory n Block 3 Using SQL n Block 4 Development of Database Systems M 877 Advanced Database Technology 4
Course Schedule 9 Tutorials Surgeries (Biweekly) n start from 30 April at ILC T 15 4 TMAs n TMA 01 (cut-off date : 5 July 2003) Examination n on Jan/Feb 5
Examination Jan/Feb (3 hrs. ) Specimen examination paper will be provided before the examination. TMA (50%) + Final Exam (50%) Average of the best 3 TMAs and final exam. must be BOTH passed (i. e. at least 40 marks). 6
TMA Attach the TMA form Extension of deadline n Tutor (7 days) n Course Coordinator (8 -21 days) n Dean (>21 days) Electronic submission n Consult your tutor on the arrangement 7
Resources on Internet MT 858 On-line Learning Environment n http: //webct. ouhk. edu. hk OUHK Electronic Library n http: //www. lib. ouhk. edu. hk Tutor’s Website n http: //openu. tripod. com/mt 858/index. html Database n Database bookmark w http: //www. vb-bookmark. com/vb. Database. html 8
Background 9
Databases and Information Systems A database is a collection of data stored in a computer system. A relational database is a database in which you may imagine the data to be stored as tables. An information system is a computer system whose primary purposes are to: n n Support the information requirements of its users. Represent and maintain the semantic integrity of its data. A database is a component of an information system. 10
Information and Data Information is any kind of knowledge that is exchangeable amongst people, about things, facts, concepts, etc. , in some context. The context is referred as ‘Domain of Discourse’ = ‘what we are talking about’. Data is a representation of information. In the other way round, information is interpreted data. The relationship between information and data n n n Information can have more than one data representation. Data does not have any meaning in isolation. Computer process data, not information. 11
Information System Concepts 12
Users A user of an information system as a person whose information requirements are being supported. Multi-user information system. For sharing data among users, all users concurrently access to a single copy of the database in a multiuser information system. A database may be described as a shared collection of data stored in a computer. 13
Model of User Interaction with a Database user view user process 1 user process 2 user process 3 user process 4 user process 5 user view database 14
Scalability In terms of the data volume and no. of current users. Two examples of the impact of size n n Avoid way of working which still require a user to browse through lots of data trying to find the proverbial needle in a hay stack. Keep the disk accesses to be minimal (through the use of indexes or good organization of data in files). 15
Management Implicit user requirements n n n Availability Reliability - involve database recovery Integrity - maintain the semantics of the data in its database according to its specification. Security Concurrency - allow concurrent access or update of data. 16
Information System Architectures 17
Information System Architectures File-based approach Database approach 18
File-based Approach Base on file that contains many records, where each record is a collection of data values, each in a field. Tailor-made programs are developed to write records to files and read records from files. Records are organized in files in various ways (i. e. file organization). File organization (e. g. serial, direct or indexed) determines how records may be accessed. Any program using a file has to be consistent with the file organization and record structure of the file. In other words, file characteristics determine how a program is written. 19
Problems With File-based Approach Sharing files among a group of application programs (application system). Data duplication between different application systems ( data inconsistency). Additional requirements n n New application which shares existing files. Modification of existing files – change of record structure or file organization. w All programs which make use of the amended files have to be modified. Every has to include all the facilities, e. g. access control, semantic constraints, recovery, transactions and concurrency, for its own management. n n Semantic constraints are embedded in programs. Duplications of similar facilities in many programs. 20
Database Approach An information system that uses a DBMS is referred as a database system. A schema contains a specification of the properties of all the data in the database. Data definitions are not embedded in programs, which overcome one of the problems for file-based approached. 21
Model of a Database System user view user process 1 user process 2 user process 3 user process 4 user process 5 user view Database Management System (DBMS) database definition 22
DBMS Functions Data definition n Creation and modification of a schema. Constraints definition and enforcement n Constraints are embedded in a schema Access control Data manipulation n Allow retrieving, adding, deleting and changing data. Restructuring and reorganization n Restructuring and reorganization are required in conjunction with schema modifications. Transaction support Concurrency support n Allow simultaneous access and update of data. Recovery 23
Interacting with a DBMS requests from application processes are expressed in Data Manipulation Language (DML). Application process is written as a series of statements using a conventional programming language combined with other statements written using DML for the DBMS. On executing an application process, the effect of an embedded DML statement is a request to a DBMS for the specified data manipulation. 24
Model showing use of SQL statements for interaction with a DBMS SQL statements user view application process direct entry database tool user view forms database tool form library (including SQL statement SQL Interface Database Management System (DBMS) database schema 25
Data definition A schema contains the definitions of data in a database, constraints on the data and users’ access rights for that data. The properties of a table given by the data definition statement are stored as part of a schema. DBMS uses a schema in processing data in the database defined by the schema. A relational DBMS stores a schema in tables called schema tables or system tables. When referring to the definition of a database, it is necessary to distinguish two representations: n n SQL DDL statements, which are the initial specification originating either from direct user input or a database tool. Schema, which results from processing SQL DDL statements 26 and is stored for use by a DBMS.
Schema Architecture Avoid data dependence within user processes. Three schema architecture n n n Logical schema - defines the logical properties of data in a database, being concerned with a representation of data, and associated constraints. Storage schema – defines how a database is stored in files and accessed. A storage schema is concerned with a physical representation of data on a storage device, so it is also known as a physical schema. External schema – defines data for a user process, more specifically, defines data from a database, which the user process retrieves or updates by interaction with a DBMS. As a simple distinction, a logical schema determines what data is available in a database, while a storage schema determines how data is processed in files. 27
Three-schema Architecture external schema 1 external schema 2 external schema 3 external schema 4 logical schema to external schema mappings logical schema to storage schema mapping storage schema 28
Data Independence How the three-schema architecture provides data independence? Data independence is concerned with the impact of change on user processes. Logical data independence exists when a change to a logical schema has no impact on a user process. n A change to a logical schema does not affect the data defined by an external schema if either it does not include the changed data or the change can be taken into account by a change to the mapping. However, it is not possible to guarantee logical data independence in all situations, because in case deleting a complete table (and all its data), which must have impact on some user processes. Physical data independence exists when a change to a storage schema has no impact on user process because the change can be taken into account by a change to the mapping for the storage schema. 29
Database Development 30
Model of database development establishing requirements data analysis conceptual data model database design logical schema implementation schema and database 31
Activities in Database Development Establishing requirements n Involves consultation with, and agreement among, users as to what persistent data they want, expressed as a statement of data requirements. Data analysis n Starts with a statement of data requirements and produces a conceptual data model, which is a formal representation of what data a database should contain, expressed in terms that are independent of how it may be realized. Database design n Starts with a conceptual data model and produces a specification of a logical schema. Implementation n Involves the construction of a database according to a given specification of a logical schema, which requires specification of an appropriate storage schema. 32
Entity-Relationship Diagram Entity-relationship diagram (E-R diagram) – expresses data requirements in terms of entity types, attributes of entity types and relationships. 33
Entity Type An entity represents a thing that has meaning in a given context and about which there is a need to record data. An entity type defines the properties common to a collection of entities. In an E-R diagram, an entity type is depicted by means of a round-cornered box with the name of the entity type written inside the box. Staff 34
Attribute An attribute is a component of an entity type that represents a single property of entities of that type. No two attributes of a given entity type have the same name, but the same name can be used for an attribute in more than one entity type. To avoid confusion, we qualify the attribute name with the name of the entity type. The value of an attribute for a given entity type is a single value. The permissible set of values that an attribute can take is known as the value set of the attribute. 35
Identifier An identifier is a attribute or a combination of attributes with the property of uniqueness such that no two entities of the same type have the same value for that attribute(s). There are three roles for an identifier: as an assertion of existence, as a reference and as a constraint. In an E-R diagram, identifier is underlined. 36
Relationships A relationship is an association between entities that has meaning in a given context and which needs to be recorded. Type of relationship is the property common to a collection of relationships. Staff Counsels Student Degree of a relationship n 1: 1, 1: N or M: N A relationship represents information about an association which has meaning in both directions. Participation condition n optional (depicted by a circle ‘ ’) mandatory (depicted by a blob ‘ • ’) a participation condition of mandatory is a constraint, but not for participation condition of optional. 37
Constraints Many aspects of the constraints in data requirements can be represented by an E-R diagram and the entity types, by the choice of identifiers and specifying the properties of relationships. However, NOT all requirements can be recorded in this way. These data requirements are written out as descriptive entry in the Constraints part of a conceptual data model. These constraints stated in the conceptual data model is intended to be included in the logical schema. 38
Assumptions Record any choices that made when representing a given statement of data requirements as a conceptual data model. Enable the details of the E-R diagram and the entity types to be understood. 39
- Slides: 39