Chapter 27 ObjectOriented DBMSs Concepts and Design Pearson
Chapter 27 Object-Oriented DBMSs – Concepts and Design Pearson Education © 2009 1
Chapter 27 - Objectives u Advanced database applications. u Unsuitability of RDBMSs for advanced database applications. u Object-oriented u Problems u The concepts. of storing objects in relational database. next generation of database systems. u Basics of object-oriented database analysis and design. Pearson Education © 2009 2
Chapter 27 - Objectives u Framework for an OODM. u Basics of the FDM. u Basics of persistent programming languages. u Main points of OODBMS Manifesto. u Main strategies for developing an OODBMS. u Single-level v. two-level storage models. u Pointer swizzling. u How an OODBMS accesses records. u Persistent schemes. Pearson Education © 2009 3
Chapter 27 - Objectives u Advantages and disadvantages of orthogonal persistence. u Issues underlying OODBMSs. u Advantages and disadvantages of OODBMSs. Pearson Education © 2009 4
Advanced Database Applications Computer-Aided Design/Manufacturing (CAD/CAM) u Computer-Aided Software Engineering (CASE) u Network Management Systems u Office Information Systems (OIS) and Multimedia Systems u Digital Publishing u Geographic Information Systems (GIS) u Interactive and Dynamic Web sites u Other applications with complex and interrelated objects and procedural data. u Pearson Education © 2009 5
Computer-Aided Design (CAD) u Stores data relating to mechanical and electrical design, for example, buildings, airplanes, and integrated circuit chips. u Designs of this type have some common characteristics: – Data has many types, each with a small number of instances. – Designs may be very large. Pearson Education © 2009 6
Computer-Aided Design (CAD) – Design is not static but evolves through time. – Updates are far-reaching. – Involves version control and configuration management. – Cooperative engineering. Pearson Education © 2009 7
Advanced Database Applications u Computer-Aided Manufacturing (CAM) – Stores similar data to CAD, plus data about discrete production. u Computer-Aided Software Engineering (CASE) – Stores data about stages of software development lifecycle. Pearson Education © 2009 8
Network Management Systems u Coordinate delivery of communication services across a computer network. u Perform such tasks as network path management, problem management, and network planning. u Systems handle complex data and require realtime performance and continuous operation. u To route connections, diagnose problems, and balance loadings, systems have to be able to move through this complex graph in real-time. Pearson Education © 2009 9
Office Information Multimedia Systems (OIS) and u Stores data relating to computer control of information in a business, including electronic mail, documents, invoices, and so on. u Modern systems now handle free-form text, photographs, diagrams, audio and video sequences. u Documents may have specific structure, perhaps described using mark-up language such as SGML, HTML, or XML. Pearson Education © 2009 10
Digital Publishing u Becoming possible to store books, journals, papers, and articles electronically and deliver them over high-speed networks to consumers. u As with OIS, digital publishing is being extended to handle multimedia documents consisting of text, audio, image, and video data and animation. u Amount of information available to be put online is in the order of petabytes (1015 bytes), making them largest databases DBMS has ever had to manage. Pearson Education © 2009 11
Geographic Information Systems (GIS) u GIS database stores spatial and temporal information, such as that used in land management and underwater exploration. u Much of data is derived from survey and satellite photographs, and tends to be very large. u Searches may involve identifying features based, for example, on shape, color, or texture, using advanced pattern-recognition techniques. Pearson Education © 2009 12
Interactive and Dynamic Web Sites u Consider online catalog for selling clothes. Web site maintains a set of preferences for previous visitors and allows a visitor to: – obtain 3 D rendering of any item based on color, size, fabric, etc. ; – modify rendering to account for movement, illumination, backdrop, occasion, etc. ; – select accessories to go with the outfit, from items presented in a sidebar; u Need to handle multimedia content and to interactively modify display based on user preferences and user selections. Added complexity of providing 3 D rendering. Pearson Education © 2009 13
Weaknesses of RDBMSs u Poor Representation of “Real World” Entities – Normalization leads to relations that do not correspond to entities in “real world”. u Semantic Overloading – Relational model has only one construct for representing data and data relationships: the relation. – Relational model is semantically overloaded. Pearson Education © 2009 14
Weaknesses of RDBMSs u Poor Support for Integrity Constraints and Enterprise u Homogeneous Data Structure – Relational model assumes both horizontal and vertical homogeneity. – Many RDBMSs now allow Binary Large Objects (BLOBs). Pearson Education © 2009 15
Weaknesses of RDBMSs u Limited Operations – RDBMs only have a fixed set of operations which cannot be extended. u Difficulty Handling Recursive Queries – Extremely difficult to produce recursive queries. – Extension proposed to relational algebra to handle this type of query is unary transitive (recursive) closure operation. Pearson Education © 2009 16
Example - Recursive Query Pearson Education © 2009 17
Weaknesses of RDBMSs u Impedance Mismatch – Most DMLs lack computational completeness. – To overcome this, SQL can be embedded in a high-level 3 GL. – This produces an impedance mismatch mixing different programming paradigms. – Estimated that as much as 30% of programming effort and code space is expended on this type of conversion. Pearson Education © 2009 18
Weaknesses of RDBMSs u Other Problems with RDBMSs – Transactions are generally short-lived and concurrency control protocols not suited for long-lived transactions. – Schema changes are difficult. – RDBMSs are poor at navigational access. Pearson Education © 2009 19
Object-Oriented Concepts u Abstraction, encapsulation, information hiding. u Objects and attributes. u Object identity. u Methods and messages. u Classes, subclasses, superclasses, and inheritance. u Overloading. u Polymorphism and dynamic binding. Pearson Education © 2009 20
Abstraction Process of identifying essential aspects of an entity and ignoring unimportant properties. u Concentrate on what an object is and what it does, before deciding how to implement it. Pearson Education © 2009 21
Encapsulation and Information Hiding Encapsulation – Object contains both data structure and set of operations used to manipulate it. Information Hiding – Separate external aspects of an object from its internal details, which are hidden from outside. u u Allows internal details of an object to be changed without affecting applications that use it, provided external details remain same. Provides data independence. Pearson Education © 2009 22
Object Uniquely identifiable entity that contains both the attributes that describe the state of a real-world object and the actions associated with it. – Definition very similar to that of an entity, however, object encapsulates both state and behavior; an entity only models state. Pearson Education © 2009 23
Attributes Contain current state of an object. u Attributes can be classified as simple or complex. u Simple attribute can be a primitive type such as integer, string, etc. , which takes on literal values. u Complex attribute can contain collections and/or references. u Reference attribute represents relationship. u An object that contains one or more complex attributes is called a complex object. Pearson Education © 2009 24
Object Identity Object identifier (OID) assigned to object when it is created that is: – System-generated. – Unique to that object. – Invariant. – Independent of the values of its attributes (that is, its state). – Invisible to the user (ideally). Pearson Education © 2009 25
Object Identity - Implementation u In RDBMS, object identity is value-based: primary key is used to provide uniqueness. u Primary keys do not provide type of object identity required in OO systems: – key only unique within a relation, not across entire system; – key generally chosen from attributes of relation, making it dependent on object state. Pearson Education © 2009 26
Object Identity - Implementation u Programming languages use variable names and pointers/virtual memory addresses, which also compromise object identity. u In C/C++, OID is physical address in process memory space, which is too small - scalability requires that OIDs be valid across storage volumes, possibly across different computers. u Further, when object is deleted, memory is reused, which may cause problems. Pearson Education © 2009 27
Advantages of OIDs u They are efficient. u They are fast. u They cannot be modified by the user. u They are independent of content. Pearson Education © 2009 28
Methods and Messages Method – Defines behavior of an object, as a set of encapsulated functions. Message – Request from one object to another asking second object to execute one of its methods. Pearson Education © 2009 29
Object Showing Attributes and Methods Pearson Education © 2009 30
Example of a Method Pearson Education © 2009 31
Class Blueprint for defining a set of similar objects. u Objects in a class are called instances. u Class is also an object with own class attributes and class methods. Pearson Education © 2009 32
Class Instance Share Attributes and Methods Pearson Education © 2009 33
Subclasses, Superclasses, and Inheritance allows one class of objects to be defined as a special case of a more general class. u Special cases are subclasses and more general cases are superclasses. u Process of forming a superclass is generalization; forming a subclass is specialization. u Subclass inherits all properties of its superclass and can define its own unique properties. u Subclass can redefine inherited methods. Pearson Education © 2009 34
Subclasses, Superclasses, and Inheritance u All instances of subclass are also instances of superclass. u Principle of substitutability states that instance of subclass can be used whenever method/construct expects instance of superclass. u Relationship between subclass and superclass known as A KIND OF (AKO) relationship. u Four types of inheritance: single, multiple, repeated, and selective. Pearson Education © 2009 35
Single Inheritance Pearson Education © 2009 36
Multiple Inheritance Pearson Education © 2009 37
Repeated Inheritance Pearson Education © 2009 38
Overriding, Overloading, and Polymorphism Overriding – Process of redefining a property within a subclass. Overloading – Allows name of a method to be reused with a class or across classes. Polymorphism – Means ‘many forms’. Three types: operation, inclusion, and parametric. Pearson Education © 2009 39
Example of Overriding u Might define method in Staff class to increment salary based on commission: method void give. Commission(float branch. Profit) { salary = salary + 0. 02 * branch. Profit; } u May wish to perform different calculation for commission in Manager subclass: method void give. Commission(float branch. Profit) { salary = salary + 0. 05 * branch. Profit; } Pearson Education © 2009 40
Overloading Print Method Pearson Education © 2009 41
Dynamic Binding Runtime process of selecting appropriate method based on an object’s type. u With list consisting of an arbitrary number of objects from the Staff hierarchy, we can write: list[i]. print u and runtime system will determine which print() method to invoke depending on the object’s (sub)type. Pearson Education © 2009 42
Complex Objects An object that consists of subobjects but is viewed as a single object. u Objects participate in a A-PART-OF (APO) relationship. u Contained object can be encapsulated within complex object, accessed by complex object’s methods. u Or have its own independent existence, and only an OID is stored in complex object. Pearson Education © 2009 43
Storing Objects in Relational Databases u One approach to achieving persistence with an OOPL is to use an RDBMS as the underlying storage engine. u Requires mapping class instances (i. e. objects) to one or more tuples distributed over one or more relations. u To handle class hierarchy, have two basics tasks to perform: (1) design relations to represent class hierarchy; (2) design how objects will be accessed. Pearson Education © 2009 44
Storing Objects in Relational Databases Pearson Education © 2009 45
Mapping Classes to Relations Number of strategies for mapping classes to relations, although each results in a loss of semantic information. (1) Map each class or subclass to a relation: Staff (staff. No, f. Name, l. Name, position, sex, DOB, salary) Manager (staff. No, bonus, mgr. Start. Date) Sales. Personnel (staff. No, sales. Area, car. Allowance) Secretary (staff. No, typing. Speed) Pearson Education © 2009 46
Mapping Classes to Relations (2) Map each subclass to a relation Manager (staff. No, f. Name, l. Name, position, sex, DOB, salary, bonus, mgr. Start. Date) Sales. Personnel (staff. No, f. Name, l. Name, position, sex, DOB, salary, sales. Area, car. Allowance) Secretary (staff. No, f. Name, l. Name, position, sex, DOB, salary, typing. Speed) (3) Map the hierarchy to a single relation Staff (staff. No, f. Name, l. Name, position, sex, DOB, salary, bonus, mgr. Start. Date, sales. Area, car. Allowance, typing. Speed, type. Flag) Pearson Education © 2009 47
Next Generation Database Systems First Generation DBMS: Network and Hierarchical – Required complex programs for even simple queries. – Minimal data independence. – No widely accepted theoretical foundation. Second Generation DBMS: Relational DBMS – Helped overcome these problems. Third Generation DBMS: OODBMS and ORDBMS. Pearson Education © 2009 48
History of Data Models Pearson Education © 2009 49
Object-Oriented Database Design Pearson Education © 2009 50
Relationships u Relationships represented using reference attributes, typically implemented using OIDs. u Consider how to represent following binary relationships according to their cardinality: – 1: 1 – 1: * – *: *. Pearson Education © 2009 51
1: 1 Relationship Between Objects A and B u Add reference attribute to A and, to maintain referential integrity, reference attribute to B. Pearson Education © 2009 52
1: * Relationship Between Objects A and B u Add reference attribute to B and attribute containing set of references to A. Pearson Education © 2009 53
*: * Relationship Between Objects A and B u Add attribute containing set of references to each object. u For relational database design, would decompose *: N into two 1: * relationships linked by intermediate entity. Can also represent this model in an ODBMS. Pearson Education © 2009 54
*: * Relationships Pearson Education © 2009 55
Alternative Design for *: * Relationships Pearson Education © 2009 56
Referential Integrity Several techniques to handle referential integrity: u Do not allow user to explicitly delete objects. – System is responsible for “garbage collection”. u Allow user to delete objects when they are no longer required. – System may detect invalid references automatically and set reference to NULL or disallow the deletion. Pearson Education © 2009 57
Referential Integrity u Allow user to modify and delete objects and relationships when they are no longer required. – System automatically maintains the integrity of objects. – Inverse attributes can be used to maintain referential integrity. Pearson Education © 2009 58
Behavioral Design u EER approach must be supported with technique that identifies behavior of each class. u Involves identifying: – public methods: visible to all users – private methods: internal to class. u Three types of methods: – constructors and destructors – access – transform. Pearson Education © 2009 59
Behavioral Design - Methods u Constructor - creates new instance of class. u Destructor - deletes class instance no longer required. u Access - returns value of one or more attributes (Get). u Transform - changes state of class instance (Put). Pearson Education © 2009 60
Identifying Methods u Several methodologies for identifying methods, typically combine following approaches: – Identify classes and determine methods that may be usefully provided for each class. – Decompose application in top-down fashion and determine methods required to provide required functionality. Pearson Education © 2009 61
UML u Represents unification and evolution of several OOAD methods, particularly: – Booch method, – Object Modeling Technique (OMT), – Object-Oriented Software Engineering (OOSE). u Adopted as a standard by OMG and accepted by software community as primary notation for modeling objects and components. Pearson Education © 2009 62
UML u Defined as “a standard language for specifying, constructing, visualizing, and documenting the artifacts of a software system”. u The UML does not prescribe any particular methodology, but instead is flexible and customizable to fit any approach and can be used in conjunction with a wide range of software lifecycles and development processes. Pearson Education © 2009 63
UML – Design Goals u u u u Provide ready-to-use, expressive visual modeling language so users can develop and exchange meaningful models. Provide extensibility and specialization mechanisms to extend core concepts. Be independent of particular programming languages and development processes. Provide a formal basis for understanding the modeling language. Encourage growth of object-oriented tools market. Support higher-level development concepts such as collaborations, frameworks, patterns, and components. Integrate best practices. Pearson Education © 2009 64
UML - Diagrams u Structural: – – class diagrams object diagrams component diagrams deployment diagrams. u Behavioral: – – – use case diagrams sequence diagrams collaboration diagrams statechart diagrams activity diagrams. Pearson Education © 2009 65
UML – Object Diagrams u Model instances of classes and used to describe system at a particular point in time. u Can be used to validate class diagram with “real world” data and record test cases. Pearson Education © 2009 66
UML – Component Diagrams u Describe organization and dependencies among physical software components, such as source code, run-time (binary) code, and executables. Pearson Education © 2009 67
UML – Deployment Diagrams u Depict configuration of run-time system, showing hardware nodes, components that run on these nodes, and connections between nodes. 68 Pearson Education © 2009
UML – Use Case Diagrams u Model functionality provided by system (use cases), users who interact with system (actors), and association between users and the functionality. u Used in requirements collection and analysis phase to represent high-level requirements of system. u More specifically, specifies a sequence of actions, including variants, that system can perform and that yields an observable result of value to a particular actor. Pearson Education © 2009 69
UML – Use Case Diagrams 70 Pearson Education © 2009
UML – Use Case Diagrams 71 Pearson Education © 2009
UML – Sequence Diagrams u Model interactions between objects over time, capturing behavior of an individual use case. u Show the objects and the messages that are passed between these objects in the use case. Pearson Education © 2009 72
UML – Sequence Diagrams Pearson Education © 2009 73
UML – Collaboration Diagrams u Show interactions between objects as a series of sequenced messages. u Cross between an object diagram and a sequence diagram. u Unlike sequence diagram, which has column/row format, collaboration diagram uses free-form arrangement, which makes it easier to see all interactions involving a particular object. Pearson Education © 2009 74
UML – Collaboration Diagrams Pearson Education © 2009 75
UML – Statechart Diagrams u Show objects can change in response to external events. u Usually model transitions of a specific object. 76 Pearson Education © 2009
UML – Activity Diagrams u Model flow of control from one activity to another. u Typically represent invocation of an operation, a step in a business process, or an entire business process. u Consist of activity states and transitions between them. Pearson Education © 2009 77
UML – Activity Diagrams Pearson Education © 2009 78
UML – Usage in Database Design Methodology u Produce use case diagrams from requirements specification or while producing requirements specification to depict main functions required of system. Can be augmented with use case descriptions. u Produce first cut class diagram (ER model). u Produce a sequence diagram for each use case or group of related use cases. u May be useful to add a control class to class diagram to represent interface between the actors and the system. Pearson Education © 2009 79
UML – Usage in Database Design Methodology u Update class diagram to show required methods in each class. u Create state diagram for each class to show class changes state in response to messages. Messages are identified from sequence diagrams. u Revise earlier diagrams based on new knowledge gained during this process. Pearson Education © 2009 80
Object-Oriented Data Model No one agreed object data model. One definition: Object-Oriented Data Model (OODM) – Data model that captures semantics of objects supported in object-oriented programming. Object-Oriented Database (OODB) – Persistent and sharable collection of objects defined by an ODM. Object-Oriented DBMS (OODBMS) – Manager of an ODB. Pearson Education © 2009 81
Object-Oriented Data Model u Zdonik and Maier present a threshold model that an OODBMS must, at a minimum, satisfy: – It must provide database functionality. – It must support object identity. – It must provide encapsulation. – It must support objects with complex state. Pearson Education © 2009 82
Object-Oriented Data Model u Khoshafian and Abnous define OODBMS as: – OO = ADTs + Inheritance + Object identity – OODBMS = OO + Database capabilities. u Parsaye et al. gives: – High-level query language with query optimization. – Support for persistence, atomic transactions: concurrency and recovery control. – Support for complex object storage, indexes, and access methods. – OODBMS = OO system + (1), (2), and (3). Pearson Education © 2009 83
Commercial OODBMSs u Gem. Stone from Gemstone Systems Inc. , u Objectivity/DB from Objectivity Inc. , u Object. Store from Progress Software Corp. , u Ontos from Ontos Inc. , u Fast. Objects from Poet Software Corp. , u Jasmine from Computer Associates/Fujitsu, u Versant from Versant Corp. Pearson Education © 2009 84
Origins of the Object-Oriented Data Model Pearson Education © 2009 85
Functional Data Model (FDM) u Interesting because it shares certain ideas with object approach including object identity, inheritance, overloading, and navigational access. u In FDM, any data retrieval task can viewed as process of evaluating and returning result of a function with zero, one, or more arguments. u Resulting data model is conceptually simple but very expressive. u In the FDM, the main modeling primitives are entities and functional relationships. Pearson Education © 2009 86
FDM - Entities u Decomposed into (abstract) entity types and printable entity types. u Entity types correspond to classes of ‘real world’ objects and declared as functions with 0 arguments that return type ENTITY. u For example: Staff() → ENTITY Property. For. Rent() → ENTITY. Pearson Education © 2009 87
FDM – Printable Entity Types and Attributes u Printable entity types are analogous to base types in a programming language. u Include: INTEGER, CHARACTER, STRING, REAL, and DATE. u An attribute is a functional relationship, taking the entity type as an argument and returning a printable entity type. u For example: staff. No(Staff) → STRING sex(Staff) → CHAR salary(Staff) → REAL Pearson Education © 2009 88
FDM – Composite Attributes Name() → ENTITY Name(Staff) → NAME f. Name(Name) → STRING l. Name(Name) → STRING Pearson Education © 2009 89
FDM – Relationships u Functions with arguments also model relationships between entity types. u Thus, FDM makes no distinction between attributes and relationships. u Each relationship may have an inverse relationship defined. u For example: Manages(Staff) —» Property. For. Rent Managed. By(Property. For. Rent) → Staff INVERSE OF Manages Pearson Education © 2009 90
FDM – Relationships u Can u also model *: * relationships: – Views(Client) —» Property. For. Rent – Viewed. By(Property. For. Rent) —» Client INVERSE OF Views and attributes on relationships: – view. Date(Client, Property. For. Rent) → DATE Pearson Education © 2009 91
FDM – Inheritance and Path Expressions u Inheritance supported through entity types. u Principle of substitutability also supported. Staff()→ ENTITY Supervisor()→ ENTITY IS-A-STAFF(Supervisor) → Staff u Derived functions can be defined from composition of multiple functions (note overloading): f. Name(Staff) → f. Name(Staff)) f. Name(Supervisor) → f. Name(IS-A-STAFF(Supervisor)) u Composition is a path expression (cf. dot notation): Supervisor. IS-A-STAFF. Name. fname Pearson Education © 2009 92
FDM – Declaration of FDM Schema 93 Pearson Education © 2009
FDM – Diagrammatic Representation of Schema Pearson Education © 2009 94
FDM – Functional Query Languages u Path expressions also used within a functional query. u For example: RETRIEVE l. Name(Viewed. By(Manages(Staff)))) WHERE staff. No(Staff) = ‘SG 14’ u or in dot notation: RETRIEVE Staff. Manages. Viewed. By. Name. l. Name WHERE Staff. staff. No = ‘SG 14’ Pearson Education © 2009 95
FDM – Advantages u Support for some object-oriented concepts. u Support for referential integrity. u Irreducibility. u Easy extensibility. u Suitability for schema integration. u Declarative query language. Pearson Education © 2009 96
Persistent Programming Languages (PPLs) Language that provides users with ability to (transparently) preserve data across successive executions of a program, and even allows such data to be used by many different programs. u In contrast, database programming language (e. g. SQL) differs by its incorporation of features beyond persistence, such as transaction management, concurrency control, and recovery. Pearson Education © 2009 97
Persistent Programming Languages (PPLs) u PPLs eliminate impedance mismatch by extending programming language with database capabilities. – In PPL, language’s type system provides data model, containing rich structuring mechanisms. u In some PPLs procedures are ‘first class’ objects and are treated like any other object in language. – Procedures are assignable, may be result of expressions, other procedures or blocks, and may be elements of constructor types. – Procedures can be used to implement ADTs. Pearson Education © 2009 98
Persistent Programming Languages (PPLs) u PPL also maintains same data representation in memory as in persistent store. – Overcomes difficulty and overhead of mapping between the two representations. u Addition of (transparent) persistence into a PPL is important enhancement to IDE, and integration of two paradigms provides more functionality and semantics. Pearson Education © 2009 99
OODBMS Manifesto u Complex objects must be supported. u Object identity must be supported. u Encapsulation must be supported. u Types or Classes must be able to inherit from their ancestors. u Dynamic binding must be supported. u The DML must be computationally complete. Pearson Education © 2009 100
OODBMS Manifesto u The set of data types must be extensible. u Data persistence must be provided. u The DBMS must be capable of managing very large databases. u The DBMS must support concurrent users. u DBMS must be able to recover from hardware/software failures. u DBMS must provide a simple way of querying data. Pearson Education © 2009 101
OODBMS Manifesto u The manifesto proposes the following optional features: – Multiple inheritance, type checking and type inferencing, distribution across a network, design transactions and versions. u No direct mention of support for security, integrity, views or even a declarative query language. Pearson Education © 2009 102
Alternative Strategies for Developing an OODBMS u Extend existing object-oriented programming language. – Gem. Stone extended Smalltalk. u Provide extensible OODBMS library. – Approach taken by Ontos, Versant, and Object. Store. u Embed OODB language constructs in a conventional host language. – Approach taken by O 2, which has extensions for C. Pearson Education © 2009 103
Alternative Strategies for Developing an OODBMS u Extend existing database language with objectoriented capabilities. – Approach being pursued by RDBMS and OODBMS vendors. – Ontos and Versant provide a version of OSQL. u Develop a novel database data model/language. Pearson Education © 2009 104
Single-Level v. Two-Level Storage Model u Traditional programming languages lack built-in support for many database features. u Increasing number of applications now require functionality from both database systems and programming languages. u Such applications need to store and retrieve large amounts of shared, structured data. Pearson Education © 2009 105
Single-Level v. Two-Level Storage Model u With a traditional DBMS, programmer has to: – Decide when to read and update objects. – Write code to translate between application’s object model and the data model of the DBMS. – Perform additional type-checking when object is read back from database, to guarantee object will conform to its original type. Pearson Education © 2009 106
Single-Level v. Two-Level Storage Model u Difficulties occur because conventional DBMSs have two-level storage model: storage model in memory, and database storage model on disk. u In contrast, OODBMS gives illusion of singlelevel storage model, with similar representation in both memory and in database stored on disk. – Requires clever management of representation of objects in memory and on disk (called “pointer swizzling”). Pearson Education © 2009 107
Two-Level Storage Model for RDBMS Pearson Education © 2009 108
Single-Level Storage Model for OODBMS Pearson Education © 2009 109
Pointer Swizzling Techniques The action of converting object identifiers (OIDs) to main memory pointers. u Aim is to optimize access to objects. u Should be able to locate any referenced objects on secondary storage using their OIDs. u Once objects have been read into cache, want to record that objects are now in memory to prevent them from being retrieved again. Pearson Education © 2009 110
Pointer Swizzling Techniques u Could hold lookup table that maps OIDs to memory pointers (e. g. using hashing). u Pointer swizzling attempts to provide a more efficient strategy by storing memory pointers in the place of referenced OIDs, and vice versa when the object is written back to disk. Pearson Education © 2009 111
No Swizzling u Easiest implementation is not to do any swizzling. u Objects faulted into memory, and handle passed to application containing object’s OID. u OID is used every time the object is accessed. u System must maintain some type of lookup table Resident Object Table (ROT) - so that object’s virtual memory pointer can be located and then used to access object. u Inefficient if same objects are accessed repeatedly. u Acceptable if objects only accessed once. Pearson Education © 2009 112
Resident Object Table (ROT) Pearson Education © 2009 113
Object Referencing u Need to distinguish between resident and nonresident objects. u Most techniques variations of edge marking or node marking. u Edge marking marks every object pointer with a tag bit: – if bit set, reference is to memory pointer; – else, still pointing to OID and needs to be swizzled when object it refers to is faulted into. Pearson Education © 2009 114
Object Referencing u Node marking requires that all object references are immediately converted to virtual memory pointers when object is faulted into memory. u First approach is software-based technique but second can be implemented using software or hardware-based techniques. Pearson Education © 2009 115
Hardware-Based Schemes u Use virtual memory access protection violations to detect accesses of non-resident objects. u Use standard virtual memory hardware to trigger transfer of persistent data from disk to memory. u Once page has been faulted in, objects are accessed via normal virtual memory pointers and no further object residency checking is required. u Avoids overhead of residency checks incurred by software approaches. Pearson Education © 2009 116
Pointer Swizzling - Other Issues u Three other issues that affect swizzling techniques: – Copy versus In-Place Swizzling. – Eager versus Lazy Swizzling. – Direct versus Indirect Swizzling. Pearson Education © 2009 117
Copy versus In-Place Swizzling u When faulting objects in, data can either be copied into application’s local object cache or accessed in-place within object manager’s database cache. u Copy swizzling may be more efficient as, in the worst case, only modified objects have to be swizzled back to their OIDs. u In-place may have to unswizzle entire page of objects if one object on page is modified. Pearson Education © 2009 118
Eager versus Lazy Swizzling u Moss defines eager swizzling as swizzling all OIDs for persistent objects on all data pages used by application, before any object can be accessed. u More relaxed definition restricts swizzling to all persistent OIDs within object the application wishes to access. u Lazy swizzling only swizzles pointers as they are accessed or discovered. Pearson Education © 2009 119
Direct versus Indirect Swizzling u Only an issue when swizzled pointer can refer to object that is no longer in virtual memory. u With direct swizzling, virtual memory pointer of referenced object is placed directly in swizzled pointer. u With indirect swizzling, virtual memory pointer is placed in an intermediate object, which acts as a placeholder for the actual object. – Allows objects to be uncached without requiring swizzled pointers to be unswizzled. Pearson Education © 2009 120
Accessing an Object with a RDBMS 121 Pearson Education © 2009
Accessing an Object with an OODBMS Pearson Education © 2009 122
Persistent Schemes u Consider three persistent schemes: – Checkpointing. – Serialization. – Explicit Paging. u Note, persistence can also be applied to (object) code and to the program execution state. Pearson Education © 2009 123
Checkpointing u Copy all or part of program’s address space to secondary storage. u If complete address space saved, program can restart from checkpoint. u In other cases, only program’s heap saved. u Two main drawbacks: – Can only be used by program that created it. – May contain large amount of data that is of no use in subsequent executions. Pearson Education © 2009 124
Serialization u Copy closure of a data structure to disk. u Write on a data value may involve traversal of graph of objects reachable from the value, and writing of flattened version of structure to disk. u Reading back flattened data structure produces new copy of original data structure. u Sometimes called serialization, pickling, or in a distributed computing context, marshaling. Pearson Education © 2009 125
Serialization u Two inherent problems: – Does not preserve object identity. – Not incremental, so saving small changes to a large data structure is not efficient. Pearson Education © 2009 126
Explicit Paging u Explicitly ‘page’ objects between application heap and persistent store. u Usually requires conversion of object pointers from disk-based scheme to memory-based scheme. u Two common methods for creating/updating persistent objects: – Reachability-based. – Allocation-based. Pearson Education © 2009 127
Explicit Paging - Reachability-Based Persistence u Object will persist if it is reachable from a persistent root object. u Programmer does not need to decide at object creation time whether object should be persistent. u Object can become persistent by adding it to the reachability tree. u Maps well onto language that contains garbage collection mechanism (e. g. Smalltalk or Java). Pearson Education © 2009 128
Explicit Paging - Allocation-Based Persistence u Object only made persistent if it is explicitly declared as such within the application program. u Can be achieved in several ways: – By class. – By explicit call. Pearson Education © 2009 129
Explicit Paging - Allocation-Based Persistence u By class – Class is statically declared to be persistent and all instances made persistent when they are created. – Class may be subclass of system-supplied persistent class. u By explicit call – Object may be specified as persistent when it is created or dynamically at runtime. Pearson Education © 2009 130
Orthogonal Persistence u Three fundamental principles: – Persistence independence. – Data type orthogonality. – Transitive persistence (originally referred to as ‘persistence identification’ but ODMG term ‘transitive persistence’ used here). Pearson Education © 2009 131
Persistence Independence u Persistence of object independent of how program manipulates that object. u Conversely, code fragment independent of persistence of data it manipulates. u Should be possible to call function with its parameters sometimes objects with long term persistence and sometimes only transient. u Programmer does not need to control movement of data between long-term and short-term storage. Pearson Education © 2009 132
Data Type Orthogonality u All data objects should be allowed full range of persistence irrespective of their type. u No special cases where object is not allowed to be long-lived or is not allowed to be transient. u In some PPLs, persistence is quality attributable to only subset of language data types. Pearson Education © 2009 133
Transitive Persistence u Choice of how to identify and provide persistent objects at language level is independent of the choice of data types in the language. u Technique that is now widely identification is reachability-based. Pearson Education © 2009 used for 134
Orthogonal Persistence - Advantages u Improved programmer productivity from simpler semantics. u Improved maintenance. u Consistent protection mechanisms over whole environment. u Support for incremental evolution. u Automatic referential integrity. Pearson Education © 2009 135
Orthogonal Persistence - Disadvantages u Some runtime expense in a system where every pointer reference might be addressing persistent object. – System required to test if object must be loaded in from disk-resident database. u Although orthogonal persistence promotes transparency, system with support for sharing among concurrent processes cannot be fully transparent. Pearson Education © 2009 136
Versions Allows changes to properties of objects to be managed so that object references always point to correct object version. u Itasca identifies 3 types of versions: – Transient Versions. – Working Versions. – Released Versions. Pearson Education © 2009 137
Versions and Configurations Pearson Education © 2009 138
Versions and Configurations Pearson Education © 2009 139
Schema Evolution u Some applications require considerable flexibility in dynamically defining and modifying database schema. u Typical schema changes: (1) Changes to class definition: (a) Modifying Attributes. (b) Modifying Methods. Pearson Education © 2009 140
Schema Evolution (2) Changes to inheritance hierarchy: (a) Making a class S superclass of a class C. (b) Removing S from list of superclasses of C. (c) Modifying order of superclasses of C. (3) Changes to set of classes, such as creating and deleting classes and modifying class names. u Changes must not leave schema inconsistent. Pearson Education © 2009 141
Schema Consistency 1. Resolution of conflicts caused by multiple inheritance and redefinition of attributes and methods in a subclass. 1. 1 Rule of precedence of subclasses over superclasses. 1. 2 Rule of precedence between superclasses of a different origin. 1. 3 Rule of precedence between superclasses of the same origin. Pearson Education © 2009 142
Schema Consistency 2. Propagation of modifications to subclasses. 2. 1 2. 2 Rule for propagation of modifications in the event of conflicts. Rule for modification of domains. 2. 3 Pearson Education © 2009 143
Schema Consistency 3. Aggregation and deletion of inheritance relationships between classes and creation and removal of classes. 3. 1 3. 2 3. 3 3. 4 Rule for inserting superclasses. Rule for removing superclasses. Rule for inserting a class into a schema. Rule for removing a class from a schema. Pearson Education © 2009 144
Schema Consistency Pearson Education © 2009 145
Client-Server Architecture u Three basic architectures: – Object Server. – Page Server. – Database Server. Pearson Education © 2009 146
Object Server u Distribute processing between the two components. u Typically, client is responsible for transaction management and interfacing to programming language. u Server responsible for other DBMS functions. u Best for cooperative, object-to-object processing in an open, distributed environment. Pearson Education © 2009 147
Page and Database Server Page Server u Most database processing is performed by client. u Server responsible for secondary storage and providing pages at client’s request. Database Server u Most database processing performed by server. u Client simply passes requests to server, receives results and passes them to application. u Approach taken by many RDBMSs. Pearson Education © 2009 148
Client-Server Architecture Pearson Education © 2009 149
Architecture - Storing and Executing Methods u Two – – approaches: Store methods in external files. Store methods in database. u Benefits of latter approach: – Eliminates redundant code. – Simplifies modifications. Pearson Education © 2009 150
Architecture - Storing and Executing Methods – Methods are more secure. – Methods can be shared concurrently. – Improved integrity. u Obviously, more difficult to implement. Pearson Education © 2009 151
Architecture - Storing and Executing Methods Pearson Education © 2009 152
Benchmarking - Wisconsin benchmark u Developed to allow comparison of particular DBMS features. u Consists of set of tests as a single user covering: – updates/deletes involving key and non-key attributes; – projections involving different degrees of duplication in the attributes and selections with different selectivities on indexed, non-index, and clustered attributes; – joins with different selectivities; – aggregate functions. Pearson Education © 2009 153
Benchmarking - Wisconsin benchmark Original benchmark had 3 relations: one called Onektup with 1000 tuples, and two others called Tenktup 1/Tenktup 2 with 10000 tuples. u Generally useful although does not cater for highly skewed attribute distributions and join queries used are relatively simplistic. u Consortium of manufacturers formed Transaction Processing Council (TPC) in 1988 to create series of transaction-based test suites to measure database/TP environments. u Pearson Education © 2009 154
TPC Benchmarks u TPC-A and TPC-B for OLTP (now obsolete). u TPC-C replaced TPC-A/B and based on order entry application. u TPC-H for ad hoc, decision support environments. u TPC-R for business reporting within decision support environments. u TPC-W, a transactional Web benchmark for e. Commerce. Pearson Education © 2009 155
Object Operations Version 1 (OO 1) Benchmark u Intended as generic measure of OODBMS performance. Designed to reproduce operations common in advanced engineering applications, such as finding all parts connected to a random part, all parts connected to one of those parts, and so on, to a depth of seven levels. u About 1990, benchmark was run on Gem. Stone, Ontos, Object. Store, Objectivity/DB, and Versant, and INGRES and Sybase. Results showed an average 30 -fold performance improvement for OODBMSs over RDBMSs. Pearson Education © 2009 156
OO 7 Benchmark u More comprehensive set of tests and a more complex databased on parts hierarchy. u Designed for detailed comparisons of OODBMS products. u Simulates CAD/CAM environment and tests system performance in area of object-to-object navigation over cached data, disk-resident data, and both sparse and dense traversals. u Also tests indexed and nonindexed updates of objects, repeated updates, and the creation and deletion of objects. Pearson Education © 2009 157
Advantages of OODBMSs u Enriched Modeling Capabilities. u Extensibility. u Removal of Impedance Mismatch. u More Expressive Query Language. u Support for Schema Evolution. u Support for Long Duration Transactions. u Applicability to Advanced Database Applications. u Improved Performance. Pearson Education © 2009 158
Disadvantages of OODBMSs u Lack of Universal Data Model. u Lack of Experience. u Lack of Standards. u Query Optimization compromises Encapsulation. u Object Level Locking may impact Performance. u Complexity. u Lack of Support for Views. u Lack of Support for Security. Pearson Education © 2009 159
- Slides: 159