Database Systems Set Theory RELATIONS A relational database
Database Systems – Set Theory RELATIONS A relational database consists of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values. A table is a collection of such relationships. Column Headers are commonly referred to as attributes Websites-Schema=(website, organization, first-year, category) websites relation: website organization first-year category www. zojjed. com Walking Promotions 2006 Fiction www. racewalk. com Walking Promotions 1995 Health www. greattreks. com Walking Promotions 2006 Travel www. twofeetgallery. com Walking Promotions 2004 Photograph www. walkinghealthy. com Walking Promotions 2002 Health www. cs. drexel. edu/~jsalvage Drexel University 2005 Education
Database Systems – Set Theory DOMAINS A Domain is the set of permitted values for a column/attribute. The domain can be any positive number as in the case with first year The domain can be a series of letters up to a maximum number of letters as in the case with organization. The domain can be valid web addresses, whose rules might be slightly more complicated. If D 1 D 2 D 3 D 4 denotes the the set set of of all all websites organizations first years categories Any row of websites must contain a 4 -tuple(v 1, v 2, v 3, v 4) where v 1 is a website in the domain D 1 v 2 is a organization in the domain D 2 v 3 is year in the domain D 3 V 4 is a category in the domain D 4 Therefore, account is a subset of D 1 x. D 2 x. D 3 x. D 4.
Database Systems – Set Theory DOMAINS In general a table must be a subset of D 1 x. D 2 x…x. Dn-1 x. Dn Tables vs. Relations There exists a close relationship between this language and the terminology used in databases. Instead of numbers DB’s use names. Relation -> table tuple -> row Websites table has 6 tuples.
Database Systems – Set Theory TUPLE NOTATION If t is a variable denoting the first tuple relationship, then t[website] denotes the website attribute of the tuple t. t[website] = “www. zojjed. com” t[organization]=”Walking Promotions” t[first-year] = 2006 t[category] = “Fiction” Alternatively, using numbered attributes: t[1] = “www. zojjed. com” t[2]= ”Walking Promotions” t[3] = 2006 t[4] = “Fiction” t r, indicate the tuple t is in the relation r
Database Systems – Set Theory DOMAINS It is possible for several attributes to have the same domain, i. e. several attributes can contain the same domain of possible values. Later we will introduce a customer relation. It has a customer name; if Ialso had an employee table with the field employee name, technically they both have the same domain. It depends upon how you look at it. If the domain is the set of all possible names, this is true. What about the domains website and first-year. They are incompatible. What about website and category? While they both store character strings, the website domain has a more limited domain. In a set, an attribute may contain the value Null. For now we will assume they do not.
Database Systems – Set Theory DATABASE SCHEMAS Logical design of the database defines the type definition of a variable DATABASE INSTANCE Snapshot of the database at a given time an instance of a variable A database schema in relations is defined by using a capitalized name for the relationship-schema and a lowercase name of each attribute. An instance of a relation is represented by a lowercase name. Websites-schema(website, organization, first-year, category) A relation on the Website-schema is as follows: websites(Website-schema) Side notes, very important: A relation has no order A relation cannot contain duplicate tuples
Database Systems – Set Theory Customers-Schema=(website, first-name, last-name) customers Relation website first-name last-name www. zojjed. com Derek Jeter www. zojjed. com Chase Utley www. drexel. edu/~jsalvage Jeremy Johnson www. racewalk. com Ryan Howard www. zojjed. com Ryan Howard Notice the website attribute appears in both the customers relation and Websites relation. This is not a coincidence, fields often are repeated. This allows distinct relations to be related. If we wanted to gather website details for customer websites we need information from both relations
Database Systems – Set Theory Combined information from website and customers relations website category first-name last-name www. zojjed. com Fiction Derek Jeter www. zojjed. com Fiction Chase Utley www. cs. drexel. edu/~jsalvage Education Jeremy Johnson www. racewalk. com Health Ryan Howard www. zojjed. com Fiction Ryan Howard In real databases, unique id fields would be used to identify the customer and the website so the website name would not be repeated in both tables.
Database Systems – Set Theory Instead of having two schemas, it’s possible to have one schema as follows: Website. Customers(website, organization, first-year, category, first-name, last-name) What is wrong with this?
Database Systems – Set Theory Redundant data as well as null fields. website organization first-year category first-name last-name www. zojjed. com Walking Promotions 2006 Fiction Derek Jeter www. racewalk. com Walking Promotions 1995 Health Ryan Howard www. greattreks. com Walking Promotions 2006 Travel Null www. twofeetgallery. com Walking Promotions 2004 Photographs Null www. walkinghealthy. com Walking Promotions 2002 Health Null www. zojjed. com Walking Promotions 2006 Fiction Ryan Howard www. zojjed. com Walking Promotions 2006 Fiction Chase Utley www. cs. drexel. edu/~jsalvage Drexel University 2005 Education Jeremy Johnson
Database Systems – Set Theory hit-counts-Schema= (website, date, hit-count) hit-counts relation website date hit-count www. zojjed. com 5/20/2007 5 www. racewalk. com 5/20/2007 2019 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099 Is there anything wrong with the above relation?
Database Systems – Set Theory hit-counts-Schema= (website, date, hit-count) hit-counts relation website date hit-count www. zojjed. com 5/20/2007 5 www. racewalk. com 5/20/2007 2019 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099 Is there anything wrong with the above relation? No, there is no reason why we cannot list a website more than once.
Database Systems – Set Theory If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows: hit-counts-Schema= (website, hit-count) hit-counts relation: website hit-count www. zojjed. com 5 www. racewalk. com 2019 www. greattreks. com 1050 www. twofeetgallery. com 32 www. walkinghealthy. com 159 www. zojjed. com 6 www. zojjed. com 5 www. cs. drexel. edu/~jsalvage 376 www. racewalk. com 2099
Database Systems – Set Theory If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows: hit-counts-Schema= (website, hit-count) hit-counts relation: website hit-count www. zojjed. com 5 www. racewalk. com 2019 www. greattreks. com 1050 www. twofeetgallery. com 32 www. walkinghealthy. com 159 www. zojjed. com 6 www. zojjed. com 5 www. cs. drexel. edu/~jsalvage 376 www. racewalk. com 2099 In real databases there would be no problem, but we said that you cannot repeat tuples in a relation. So the answer is no, we cannot use this schema because of duplicates.
Database Systems – Set Theory QUERY LANGUAGES A query language is a language in which the user requests information from the database. Can be procedural or non-procedural. We will study Relational Algebra It Is a procedural language consisting of sets of operations that take one or two relations as input and output a relation. Operations include: select project union set difference Cartesian product Rename Intersection Aggregate functions We will also study various forms of joining relations.
Database Systems – Set Theory Unary- operates on one relation Binary – operates on a pair of relations The Select Operation Unary operation Selects tuples that satisfy a given predicate - represents a select operation - sigma <select condition>(R) <selection condition> = <attribute name> <comparison op> <constant value> or <selection condition> = <attribute name> <comparison op> <attribute name> comparison operators are: =, <>, <, <=, >, >= equal, not equal, less than or equal to, greater than or equal to
Database Systems – Set Theory To select those tuples of the hit-counts relation where the website is “www. zojjed. com” we write. website = “www. zojjed. com”(hit-counts) This returns the relation: hit-counts relation website date hit-count www. zojjed. com 5/20/2007 5 www. zojjed. com 5/21/2007 6 www. racewalk. com 5/20/2007 2019 www. zojjed. com 5/22/2007 5 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099
Database Systems – Set Theory To select those tuples of the hit-counts relation where the hit-count is greater than 1000 we write. hit-count > 1000 (hit-counts) This returns the relation: hit-counts relation website date hit-count www. racewalk. com 5/20/2007 2019 www. zojjed. com 5/20/2007 5 www. greattreks. com 5/20/2007 1050 www. racewalk. com 5/20/2007 2019 ww. racewalk. com 5/21/2007 2099 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099
Database Systems – Set Theory Can combine predicates with and, or, and not To select those tuples of the hit-counts relation where the hit-count is greater than 5 and the website is www. zojjed. com, we write. hit-count > 5 (hit-counts) and website = “www. zojjed. com” This returns the relation: hit-count > 5 website date hit-count www. zojjed. com 5/21/2007 6 www. racewalk. com 5/20/2007 2019 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 website = “www. zojjed. com” website date hit-count www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/20/2007 5 www. zojjed. com 5/21/2007 6 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. zojjed. com 5/22/2007 5 www. racewalk. com 5/21/2007 2099
Database Systems – Set Theory The Project Operation unary returns arguments in relation without all attributes duplicates are removed - represent project operation - pi <attribute list> (R) website, category(Websites) website category www. zojjed. com Fiction www. racewalk. com Health www. greattreks. com Travel www. twofeetgallery. com Photographs www. walkinghealthy. com Health www. cs. drexel. edu/~jsalvage Education
Database Systems – Set Theory Composition of Relational Operations Often we need to combine operations. Often we wish to select a set of tuples and limit the relation returned to a few attributes. What if we want to find out only the websites that have had greater than 1000 hits in a given day? First we must find out what tuples have hit counts greater than 1000. We can accomplish this with the following relational query: hit-count>1000 (hit-counts) By using the Project operation we can remove the extra attributes such as hit-count and date and only return the values in the website column. website( hit-count>1000 (hit-counts)) What is the relation that is returned?
Database Systems – Set Theory What if we want to find out only the websites that have had greater than 1000 hits in a single day? website( hit-count>1000 (hit-counts)) What is the relation that is returned? hit-counts relation website date hit-count www. racewalk. com www. zojjed. com 5/20/2007 5 www. greattreks. com www. racewalk. com 5/20/2007 2019 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099
Database Systems – Set Theory Union Operator binary - union operator It is often useful to combine the results of queries. Again, remember that set theory removes duplicates. Relation 1 Relation 2 = Result Set
Database Systems – Set Theory What is a query that returns all websites that have customers OR a hit count greater than 1000? We need information from both the customers relation as well as the hit count relation. First we need the names of all websites that have customers website(customers) customers Relation website first-name last-name www. drexel. edu/~jsalvage www. zojjed. com Derek Jeter www. racewalk. com www. zojjed. com Chase Utley www. zojjed. com www. drexel. edu/~jsalvage Jeremy Johnson www. racewalk. com Ryan Howard www. zojjed. com Ryan Howard
Database Systems – Set Theory Then we need the names of the websites that have a hit count greater than 1000: website( hit-count>1000 (hit-counts)) website www. racewalk. com www. greattreks. com hit-counts relation website date hit-count www. zojjed. com 5/20/2007 5 www. racewalk. com 5/20/2007 2019 www. greattreks. com 5/20/2007 1050 www. twofeetgallery. com 5/20/2007 32 www. walkinghealthy. com 5/20/2007 159 www. zojjed. com 5/21/2007 6 www. zojjed. com 5/22/2007 5 www. cs. drexel. edu/~jsalvage 5/20/2007 376 www. racewalk. com 5/21/2007 2099
Database Systems – Set Theory Combine the results using a union operation website(customers) website( (hit-counts)) hit-count>1000 website www. drexel. edu/~jsalvage www. racewalk. com www. greattreks. com www. zojjed. com website( hit-count>1000 (hit-counts)) website(customers) website www. racewalk. com www. drexel. edu/~jsalvage www. greattreks. com www. racewalk. com www. zojjed. com Remember, order is not important! • Unions MUST be of similar types • They MUST have the same number of attributes • The domains of the attributes MUST be the same
Database Systems – Set Theory Intersection Operator binary ∩ - intersection operator Returns all tuples contained within both relations Relation 1 ∩ Relation 2 = Result Set
Database Systems – Set Theory What is a query that returns all websites that have customers AND a hit count greater than 1000? First we need the names of all websites that have customers website(customers) website www. drexel. edu/~jsalvage www. racewalk. com www. zojjed. com Then we need the names of the websites that have a hit count greater than 1000: website( (hit-counts)) hit-count>1000 website www. racewalk. com www. greattreks. com
Database Systems – Set Theory What is a query that returns all websites that have customers AND a hit count greater than 1000? website(customers) ∩ website( hit-count>1000 (hit-counts)) website www. racewalk. com website( hit-count>1000 (hit-counts)) website(customers) website www. racewalk. com www. drexel. edu/~jsalvage www. greattreks. com www. racewalk. com www. zojjed. com
Database Systems – Set Theory The Set Difference Operation (MINUS) binary -, denotes set difference Relation 1 - Relation 2 = Result Set Finds tuples in one set but not in another r – s, produces a set containing those tuples in r but not in s, operand order is significant.
Database Systems – Set Theory Produce a list of websites who have a hit count > 1000 and do not have a customer. We need the names of all websites that have customers website(customers) website www. drexel. edu/~jsalvage www. racewalk. com www. zojjed. com Then we need the names of the websites that have a hit count greater than 1000: website( (hit-counts)) hit-count>1000 website www. racewalk. com www. greattreks. com website( hit-count>1000 (hit-counts)) - website(customers)
Database Systems – Set Theory Produce a list of websites who have a hit count > 1000 and do not have a customer. website( hit-count>1000 (hit-counts)) - website(customers) website www. greattreks. com website( hit-count>1000 (hit-counts)) website(customers) website www. racewalk. com www. drexel. edu/~jsalvage www. greattreks. com www. racewalk. com www. zojjed. com Notice the attributes in R 1 that are not in R 2 are included in the result set, but the attributes in R 2 that are not in R 1 are not included in the result set, i. e. operand order is significant.
Database Systems – Set Theory Given the relation websites below: website organization first-year category www. zojjed. com Walking Promotions 2006 Fiction www. racewalk. com Walking Promotions 1995 Health www. greattreks. com Walking Promotions 2006 Travel www. twofeetgallery. com Walking Promotions 2004 Photographs www. walkinghealthy. com Walking Promotions 2002 Health www. cs. drexel. edu/~jsalvage Drexel University 2005 Education What is the result of the following? website(empty set) - website(websites)
Database Systems – Set Theory Given the relation websites below: website organization first-year category www. zojjed. com Walking Promotions 2006 Fiction www. racewalk. com Walking Promotions 1995 Health www. greattreks. com Walking Promotions 2006 Travel www. twofeetgallery. com Walking Promotions 2004 Photographs www. walkinghealthy. com Walking Promotions 2002 Health www. cs. drexel. edu/~jsalvage Drexel University 2005 Education What is the result of the following? website(empty set) - website(websites) The result is the empty set, because no records are contained in “R 1” and only the records in R 1 that are not in R 2 are returned from the MINUS operator.
Database Systems – Set Theory Given the relation websites below: website organization first-year category www. zojjed. com Walking Promotions 2006 Fiction www. racewalk. com Walking Promotions 1995 Health www. greattreks. com Walking Promotions 2006 Travel www. twofeetgallery. com Walking Promotions 2004 Photographs www. walkinghealthy. com Walking Promotions 2002 Health www. cs. drexel. edu/~jsalvage Drexel University 2005 Education What is the result of the following? website(websites) - website(empty set)
Database Systems – Set Theory The result of website(websites) - website(empty set) is the complete relation websites, since no records are contained in the empty set all records from the websites relation are included in the result set. website www. zojjed. com www. racewalk. com www. greattreks. com www. twofeetgallery. com www. walkinghealthy. com www. cs. drexel. edu/~jsalvage
Database Systems – Set Theory The Cartesian-Product Operation binary x – combines information in two relations Relation 1 x Relation 2 = Result Set because attributes can be repeated in different relations, a notation relation. attribute will be used. Therefore, the resulting schema of r = websites x customers is: (websites. website, websites. organization, websites. first-year, websites. category, customers. website, customers. first-name, customers. last-name) Note this does not support cases where you wish to use the same relation twice; we will address this with the rename operation shortly. What tuples exist in r if r = websites x customers? The combination of all tuples in websites with every tuple in customers. Given r 1 with n 1 tuples and r 2 with n 2 tuples then r 1 xr 2 has n 1*n 2 tuples
Database Systems – Set Theory Let’s look at a simplified example first. If relation R 1 contains the following: Then R 1 x R 2 contains the following: R 1. Value 1 R 2. Value 2 1 A 1 B 1 C 2 A 2 B Value 2 2 C A 3 A B 3 B C 3 C Value 1 1 2 3 and if relation R 2 contains the following:
Database Systems – Set Theory Similarly, websites x customers appears as follows: websites. website organization category customers. website first-name last-name Walking Promotions firstyear 2006 www. zojjed. com Fiction www. zojjed. com Derek Jeter www. zojjed. com Walking Promotions 2006 Fiction www. zojjed. com Chase Utley www. zojjed. com Walking Promotions 2006 Fiction www. drexel. edu/~jsalvage Jeremy Johnson www. zojjed. com Walking Promotions 2006 Fiction www. racewalk. com Ryan Howard www. zojjed. com Walking Promotions 2006 Fiction www. zojjed. com Ryan Howard www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Derek Jeter www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Chase Utley www. racewalk. com Walking Promotions 1995 Health www. drexel. edu/~jsalvage Jeremy Johnson www. racewalk. com Walking Promotions 1995 Health www. racewalk. com Ryan Howard www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Ryan Howard www. greattreks. com Walking Promotions 2006 Travel www. zojjed. com Derek Jeter www. greattreks. com Walking Promotions 2006 Travel www. zojjed. com Chase Utley www. greattreks. com Walking Promotions 2006 Travel www. drexel. edu/~jsalvage Jeremy Johnson www. greattreks. com Walking Promotions 2006 Travel www. racewalk. com Ryan Howard www. greattreks. com Walking Promotions 2006 Travel www. zojjed. com Ryan Howard . .
Database Systems – Set Theory What if we want to find all the customers who bought from a website created before the year 2000? We could try the following: first-year <2000 (websites x customers) Note that we are not using a projection to reduce the number of attributes to show what is really happening. In the end, you would use a projection to show only the fields requested by the query. websites. website organization first-year category customers. website first-name last-name www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Derek Jeter www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Chase Utley www. racewalk. com Walking Promotions 1995 Health www. drexel. edu/~jsalvage Jeremy Johnson www. racewalk. com Walking Promotions 1995 Health www. racewalk. com Ryan Howard www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Ryan Howard Oops, too many tuples!
Database Systems – Set Theory Because the Cartesian-product pairs all possible tuples, all tuples from websites are combined with all tuples from customers. While only those with the first-year < 2000 are selected, it still returns 5 tuples. Of those sets, we only want the ones where the websites relation’s website attribute equals the customers relation’s website attribute. websites. website organization first-year category customers. website first-name last-name www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Derek Jeter www. racewalk. com Walking Promotions 1995 Health www. zojjed. com Chase Utley www. racewalk. com Walking Promotions 1995 Health www. drexel. edu/~jsalvage Jeremy Johnson www. racewalk. com Walking Promotions 1995 Health www. racewalk. com www. zojjed. com Ryan The only tuple we truly want is the highlighted tuple. we can write this as follows: websites. website = customers. website( first-year <2000 (websites x customers)) Howard
Database Systems – Set Theory Since, websites. website = customers. website( first-year <2000 (websites x customers)) Returns the following tuple with too many attributes, we must also use a projection to remove the excessive attributes. website organization first-year category customers. website first-name last-name www. racewalk. com Walking Promotions 1995 Health www. racewalk. com Ryan Howard Applying the projection of first-name, last name to the previous query gives us the following query: first-name, last-name( websites. website = customers. website( first-year <2000 (websites x customers)))
Database Systems – Set Theory The Assignment Operator unary allows an expression to be assigned to a variable New. Relation Old. Relation For example: 1200 loans amount > 1200(loan) or result loan-number(1200 loans) Or The Rename Operation Unary x(E) renames the expression E to x. Relational-algebra expressions do not have a name that we can refer to them by using the rename operator, is roh.
Database Systems – Set Theory Example, without using an aggregation function (not yet shown), find the largest hit count of any website for a single day. If the same max hit count exists more than once, you are only allowed to return a single tuple containing the answer. We accomplish this in two steps: First, compute a temporary relation consisting of hit counts less than the largest hit count. Second, take the set difference (minus) between the relation hit-count(hit-counts) and the temporary relation To accomplish the first step, compute all the websites hit counts compared to all the websites hit counts, in other words compute the Cartesian product of the relation hit-counts with itself. hit-counts x hit-counts However, we must rename one of the hit-counts relations so that we can identify the balance distinctly
Database Systems – Set Theory Given the projection of only the hit-count field from the relation hit-counts via hit-count(hit-counts) If we rename the result of this projection We have: d ( hit-count(hit-counts)) hit-count 5 2019 1050 32 159 6 376 2099 And thus create a cross product of the two relations as hit-count(hit-counts) x d ( hit-count(hit-counts))
Database Systems – Set Theory hit-count(hit-counts) x d ( hit-count(hit-counts)) hit-count d(hit-count) 5 5 2019 5 1050 5 32 5 159 5 6 5 376 5 2099 5 5 2019 1050 2019 32 2019 159 2019 6 2019 376 2019 2099 2019 hit-count d(hit-count) 5 1050 5 159 2019 1050 2019 159 1050 159 32 1050 32 159 1050 159 6 1050 6 159 376 1050 376 159 2099 1050 2099 159 5 32 5 6 2019 32 2019 6 1050 32 1050 6 32 32 32 6 159 32 159 6 6 32 6 6 376 32 376 6 2099 32 2099 6
Database Systems – Set Theory hit-count(hit-counts) x d ( hit-count(hit-counts)) hit-count d(hit-count) 5 376 2019 376 1050 376 32 376 159 376 6 376 376 2099 376 5 2099 2019 2099 1050 2099 32 2099 159 2099 6 2099 376 2099
Database Systems – Set Theory Now we select only those tuples that have the first attribute containing a value less than the second attribute, we do so with the following query: hitcounts. hit-count < d. hit-count ( hit-count(hit-counts) x d ( hit-count(hit-counts))) hit-count d(hit-count) 5 5 2019 5 1050 5 32 5 159 5 6 5 376 5 2099 5 5 2019 1050 2019 32 2019 159 2019 6 2019 376 2019 2099 2019 hit-count d(hit-count) 5 1050 5 159 2019 1050 2019 159 1050 159 32 1050 32 159 1050 159 6 1050 6 159 376 1050 376 159 2099 1050 2099 159 5 32 5 6 2019 32 2019 6 1050 32 1050 6 32 32 32 6 159 32 159 6 6 32 6 6 376 32 376 6 2099 32 2099 6
Database Systems – Set Theory hit-count(hit-counts) x d ( hit-count(hit-counts)) hit-count d(hit-count) 5 376 2019 376 1050 376 32 376 159 376 6 376 376 2099 376 5 2099 2019 2099 1050 2099 32 2099 159 2099 6 2099 376 2099
Database Systems – Set Theory This certainly gives us a lot of tuples, but if we then project just the hit-count from the first column and remove the duplicates, we are left with the following: hit-count 5 2019 1050 32 159 6 376 This is the set containing most hit counts, but it does not include the largest hit count.
Database Systems – Set Theory To get just the largest hit count we now simply subtract our result set from the projection of the original hit count relation as follows: hit-count(hit-counts) - hit-count( hitcounts. hit-count < d. hit-count ( hit-count(hit-counts) x d ( hit-count(hit-counts)))) hit-count 2099
Database Systems – Set Theory We need a better way to represent certain queries because the notation for joining two relations and only selecting records where the attributes match is too cumbersome. Therefore we have: The Natural Join Operation Binary Result Set = R 1 |x| R 2 The natural join operation finds the Cartesian product of two relations, but only returns tuples where the attributes whose names are the same in both relations contain the same values.
Database Systems – Set Theory Let’s look at a simplified example first. If relation R 1 contains the following: and if relation R 2 contains the following: Value 1 Value 2 Value 3 1 X X A 2 Y Z B 3 Z A C
Database Systems – Set Theory Then R 1 x R 2 contains the following: R 1. Value 1 R 1. Value 2 R 2. Value 3 1 X X A 1 X Z B 1 X A C 2 Y X A 2 Y Z B 2 Y A C 3 Z X A 3 Z Z B 3 Z A C Then R 1 |x| R 2 contains the following: R 1. Value 1 R 1. Value 2 R 2. Value 3 1 X X A 3 Z Z B
Database Systems – Set Theory Example: Find the names of all customers who have made a purchase from a health or travel website. Return the name of the customer, the website, and the category of the website. The old way: Form a Cartesian product of the websites and customers relations. Select the tuples of the same website as well as a category equal to “health” or “travel. ” Project the first-name, last-name, website, and category first-name, last-name, website, category websites. website = customers. website and (websites category = “Health” or category = “Travel” ( x customers))
Database Systems – Set Theory Another example: Find all the names of websites and the dates they have a hit count for web sites that are in the health category.
Database Systems – Set Theory Another example: Find all the web site names and the dates that have hit counts for web sites in the health category. website, date ( category = “Health” (websites |x| hit-counts))
Database Systems – Set Theory Generalized Projections Allows basic arithmetic operations within fields of a tuple Observe the Sales relation: product first-name last-name tax total-cost Zojjed! Derek Jeter 1. 00 17. 95 Zojjed! Chase Utley 1. 00 17. 95 VB. NET Coach Jeremy Johnson 0 54. 95 Race Walk Like A Champion Ryan Howard 1. 25 25. 95 Zojjed! Ryan Howard 0 16. 95 What was the price of the product sold minus the tax paid? product, first-name, last-name, (total-cost – tax) as net-pay (Sales)
Database Systems – Set Theory Aggregate Functions takes a collection of values and returns a single value as a result i. e sum {1, 1, 3, 4, 4, 11} returns the value 24. avg {1, 1, 3, 4, 4, 11} returns the value 4. count {1, 1, 3, 4, 4, 11} returns the value 6. min {1, 1, 3, 4, 4, 11} returns the value 1. max {1, 1, 3, 4, 4, 11} returns the value 11. count-distinct {1, 1, 3, 4, 4, 11} returns the value 4. Ignore the fact that we said sets can’t contain duplicate values
Database Systems – Set Theory Operations to Modify the Contents of Relations Deletion r r–E This uses set difference (minus) and variable assignment. Delete all of the sale of “Zojjed!” from the Sales relation sales - product = “Zojjed!” (sales) Delete all sales with no tax collected sales - tax = 0 (sales) Insertion r r E This uses union and variable assignment. Add a record to the sales relation sales {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15. 95)} Add a record to the websites relation websites {(“www. mediatitan. net”, “Walking Promotions”, 2006, “Fiction”)}
Database Systems – Set Theory Updating Zero all tax attributes in the sales relation sales product, first-name, last-name, 0, total-cost(sales)
Database Systems – Set Theory Joins There are other forms of joins. Let’s look at the following two simple relations: employee-name city employee-name team Jeter New York City Glavine Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities relation teams relation employee-name city employee-name team Howard Philadelphia Howard Phillies Schilling Boston Schilling Choke Sox Natural Join -> cities |x| teams The natural join omits records that do not match, so we do not have records for Jeter, Utley, Glavine, or Bonds.
Database Systems – Set Theory employee-name city employee-name team Jeter New York City Glavine Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities relation teams relation employee-name city employee-name team Jeter New York City Null Howard Philadelphia Howard Phillies Utley Philadelphia Null Schilling Boston Schilling Choke Sox Left Outer Join -> cities LOJ teams Includes all records from the left and only those records on the right that match
Database Systems – Set Theory A subtle note for Left Outer Join. If relation R 1 contains the following: and if relation R 2 contains the following: Value 1 Value 2 Value 3 1 X X A 2 Y Z B 3 Z Z C
Database Systems – Set Theory Then R 1 LOJ R 2 contains the following: R 1. Value 1 R 1. Value 2 R 2. Value 3 1 X X A 2 Y Null 3 Z Z B 3 Z Z C
Database Systems – Set Theory employee-name city employee-name team Jeter New York City Glavine Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities relation teams relation employee-name city employee-name team Null Glavine Mets Howard Philadelphia Howard Phillies Null Bonds Giants Schilling Boston Schilling Choke Sox Right Outer Join -> cities ROJ teams Includes all records from the right and only those records on the left that match
Database Systems – Set Theory employee-name city employee-name team Jeter New York City Glavine Mets Howard Philadelphia Howard Phillies Utley Philadelphia Bonds Giants Schilling Boston Schilling Choke Sox cities relation teams relation employee-name city employee-name team Null Glavine Mets Howard Philadelphia Howard Phillies Null Bonds Giants Schilling Boston Schilling Choke Sox Jeter New York City Null Utley Philadelphia Null Full Outer Join -> cities FOJ teams Includes all records from the right and left, tuples that do not match have nulls in their place.
Database Systems – Set Theory NULLS And: true and unknown = unknown false and unknown = false unknown and unknown = unknown Or true or unknown = true false or unknown = unknown Not not unknown = unknown
Database Systems – Set Theory REFERENTIAL INTEGRITY Superkey of R A unique identifier for a tuple. A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t 1 and t 2 in r(R), t 1[SK] t 2[SK]. Key of R A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey, i. e. every attribute of K is needed to maintain its superkey status. Example: The CAR relation schema CAR(State, Reg#, Serial. No, Make, Model, Year) has two keys Key 1 = {State, Reg#} Key 2 = {Serial. No} Both are superkeys. {Serial. No, Make} is a superkey but not a key. If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.
Database Systems – Set Theory REFERNTIAL INTEGRITY Relational Database Schema A set S of relation schemas that belong to the same database. S is the name of the database. S = {R 1, R 2, . . . , Rn} Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] null for any tuple t in r(R) Note, other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.
Database Systems – Set Theory Referential Integrity A constraint involving two relations (the previous constraints involve a single relation). Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation. Tuples in the referencing relation R 1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R 2. A tuple t 1 in R 1 is said to reference a tuple t 2 in R 2 if t 1[FK] = t 2[PK]. A referential integrity constraint can be displayed in a relational database schema as a directed arc from R 1. FK to R 2.
- Slides: 71