The Relational Model

(read 3.1 - 3.2)

Introduction

The relational data model is a conceptual data model in which data is represented by a collection of relations (or tables).

A relation on sets tex2html_wrap_inline15 , tex2html_wrap_inline17 , ... tex2html_wrap_inline19 is a subset of the cartesian product of tex2html_wrap_inline15 , tex2html_wrap_inline17 , ..., tex2html_wrap_inline19 .

The cartesian product of tex2html_wrap_inline15 , tex2html_wrap_inline17 , ..., tex2html_wrap_inline19 , written tex2html_wrap_inline33 , is the set of all possible tuples:

displaymath13

Example - the fly relation:

fly
pattern size color fly-stock-num
woolly bugger 10 olive 5
gold-ribbed hare's ear 14 natural 2
woolly bugger 2 black 4
Dahlberg Diver 2 natural 1

A tuple variable t is a variable that ranges over a relation (its domain is a relation). The value held by a tuple variable at any time is a tuple.

Notation: t tex2html_wrap_inline41 r denotes that t is a tuple in r (the domain of t is r). t[attribute-name] denotes the value of the named attribute in tuple t.

Example: if t tex2html_wrap_inline41 fly, then t has 3 possible values and the possible values of t[size] are 2, 10, 14

The column number may be used in place of the attribute name. Example: t[2] tex2html_wrap_inline45 t[size]

Return to Table of Contents

Relational Database Schemes

Review:

In the relational model, a relation scheme is a list of the attributes of a relation, possibly including the domains of the attributes.

Example: the relation scheme for fly is (pattern, size, color, fly-stock-num) or (pattern: string[30], size: integer, color :{black, olive, natural}, fly-stock-num: integer)

Notation:

The scheme of a relational database consists of the schemes of all the relations in the database.

Schemes for the flyshop database:

scheme name attributes
Fly-scheme (pattern, size, color, fly-stock-num)
Customer-scheme (name, address, cust-num)
Flyrod-scheme (manufacturer, length, line-weight, flyrod-stock-num)
Purchased-scheme (cust-num, flyrod-stock-num, date)
Requested-By-scheme (fly-stock-num, cust-num)
Fave-Combo-scheme (cust-num, flyrod-stock-num, fly-stock-num)

Notes:

The use of the same attribute names in different relations is a way of relating tuples of different relations.

Example query: find the addresses of all customers who have purchased a Sage 2-weight.

Solution:

  1. find the flyrod-stock-num of the Sage 2-weight by looking in the flyrod relation
  2. find the cust-num of all customers who have purchased a flyrod with that flyrod-stock-num by looking in the purchased relation
  3. find the addresses of all customers in the customer relation using the cust-num's from the previous step

Question: should we use one relation with scheme (name, address, cust-num, manufacturer, length, line-weight, flyrod-stock-num, date, pattern, size, color, fly-stock-num) rather than the current 6 relations?

Problems with using one big relation:

  1. null values. If a customer hasn't purchased any flyrods, there are no sensible values for the manufacturer, length, ... attributes in that customer's tuple. A special value called the null value could be used for any attributes with no sensible values, but null values will cause problems later.
  2. duplication of information. A customer's name and address are duplicated for each flyrod purchased and for each fly requested. The manufacturer, line-weight and length of a flyrod are duplicated each time the same make of flyrod is purchased.

    Problems caused by duplication:

Note that our database scheme avoids many of these problems:

Return to Table of Contents

Keys

The concepts of superkey, candidate key and primary key from the E-R model apply directly to the relational model.

A superkey is a subset of the attributes of a relation scheme that uniquely distinguish any tuple in a relation on that scheme.

Example - for relations on Flyrod-scheme, the superkeys include:

A candidate key is a superkey that has no superkeys as proper subsets.

Example - for relations on Flyrod-scheme, the candidate keys are:

The primary key is the (one) candidate key designated as the main way to distinguish tuples in the relation

Example: for relations on Flyrod-scheme, we'll designate {flyrod-stock-num} as the primary key.

Notation: we extend [ ] (indexing into tuple variables) to work with sets of attributes. The result will be a tuple of the values of attributes in the set.

Example: if t tex2html_wrap_inline13 customer and t = (Tim Wahls, E258, 1), then t[{name, address}] = (Tim Wahls, E258)

Formally, let R be a relation scheme and r a relation on that scheme, i.e. r(R), and K tex2html_wrap_inline15 R such that K = tex2html_wrap_inline17 . If t is a tuple variable and t tex2html_wrap_inline13 r, then:

displaymath11

Fundamental property of a superkey: if R is a relation scheme, then K tex2html_wrap_inline15 R is a superkey for R if and only if for every relation r(R),

if tex2html_wrap_inline23 r and tex2html_wrap_inline25 r and tex2html_wrap_inline27 , then tex2html_wrap_inline29 [K] tex2html_wrap_inline31 [K]

equivalently, if tex2html_wrap_inline29 [K] = tex2html_wrap_inline35 [K], then tex2html_wrap_inline29 = tex2html_wrap_inline35

A superkey is a property of a relation scheme, not a relation. A superkey must be a superkey for every relation on the scheme.

Formally, if r is a relation with the same attributes as R and K tex2html_wrap_inline15 R is a superkey for R, but tex2html_wrap_inline23 r and tex2html_wrap_inline25 r and tex2html_wrap_inline29 [K] = tex2html_wrap_inline35 [K] and tex2html_wrap_inline27 , then it is not the case that r(R), i.e. r is not a relation on scheme R.

Return to Table of Contents

Query Languages

Review: a query language is a language used to retrieve information from a database. It is usually part of a data manipulation language (DML).

"Pure" query languages for relational databases:

These languages are called pure because they are very formal, and because they include none of the non-query functionality of a DML (delete, insert, modify).

Relational algebra is a procedural query language - users write tiny algorithms in relational algebra for finding the information to be retrieved. Relational algebra is the basis for the commercial language SQL.

Tuple relational calculus and domain relational calculus are nonprocedural query languages - users describe the information they need in variants of first order predicate logic, and the system is responsible for finding that information. Tuple relational calculus is the basis for the commercial language Quel, and domain relational calculus is the basis for the commercial language QBE.

Return to Table of Contents

Relational Algebra

Relational algebra is a set of operations that take relations as arguments. All relational algebra operations except assignment return relations as their result.

There are two kinds of relational operations:

Fundamental operations (unary):

Fundamental operations (binary):

Nonfundamental operations:

Examples involving these operations use the following relations:

flyrod
manufacturer length line-weight flyrod-stock-num
Sage 7.0 2 3
G. Loomis 8.5 6 1
Orvis 9.5 9 2

fly
pattern size color fly-stock-num
woolly bugger 10 olive 5
gold-ribbed hare's ear 14 natural 2
woolly bugger 2 black 4
Dahlberg Diver 2 natural 1

customer
name address cust-num
Tim Wahls E258 1
Linda Null E258 3
Elvis Presley Graceland 2

purchased
cust-num flyrod-stock-num date
1 3 9/4/1994
3 3 9/3/1994
2 2 8/8/1992
3 2 9/2/1994
3 1 9/1/1994

requested-by
cust-num fly-stock-num
1 2
3 5
2 4
2 1

Return to Table of Contents

Fundamental Operations of Relational Algebra

The Select Operation

The select operation chooses (or selects) tuples from a relation that satisfy a given condition (called a predicate). The result is a relation that is a subset of the argument relation.

Example: selecting all flyrods over 8.0 feet long from flyrod produces:

manufacturer length line-weight flyrod-stock-num
G. Loomis 8.5 6 1
Orvis 9.5 9 2

The result of a select operation can be the entire argument relation or the empty relation.

Notation: The Greek letter sigma ( tex2html_wrap_inline23 ) is the usual symbol for select. The selection predicate is written as a subscript of tex2html_wrap_inline23 , and the argument relation is written in parenthesis.

Example - in this notation, the previous query is:

displaymath17

The selection predicate can refer to attribute names and constants of the attributes' domains. The following comparisons can be used: =, tex2html_wrap_inline29 , <, tex2html_wrap_inline33 , >, tex2html_wrap_inline37 . Predicates can be built using boolean operators tex2html_wrap_inline39 (logical and) and tex2html_wrap_inline41 (logical or).

Example:

displaymath18

produces the relation:

manufacturer length line-weight flyrod-stock-num
Orvis 9.5 9 2

The selection predicate may compare two attributes:

displaymath19

produces the empty relation:

manufacturer length line-weight flyrod-stock-num

Return to Table of Contents

The Project Operation

The project operation is used to remove attributes (and the associated columns) from a relation. For example, if we only wanted to know what lengths of flyrods are we stock from particular manufacturers, we would project the flyrod relation on the manufacturer and length attributes, producing:

manufacturer length
Sage 7.0
G. Loomis 8.5
Orvis 9.5

If multiple tuples have the same values for the attributes projected on, the result of a projection will have fewer tuples than the argument relation.

Example: if there were a 7 foot Sage 5-weight as well as the 7 foot Sage 2-weight, the result of a projection on manufacturer and length would be the same as above.

Notation: The Greek letter pi ( tex2html_wrap_inline18 ) is the usual symbol for project. The list of attributes to project on is written as a subscript of tex2html_wrap_inline18 , and the argument relation is written in parenthesis.

Example - in this notation, the previous query is:

displaymath14

To see just the manufacturers of flyrods carried by the shop that are over 8.0 feet long, compose a select and a project operation. That is, the result of a selection becomes the argument of a projection:

displaymath15

producing:

manufacturer
G. Loomis
Orvis

Return to Table of Contents

The Cartesian Product Operation

The cartesian product combines information from several relations in answering a query. The cartesian product of relations tex2html_wrap_inline49 and tex2html_wrap_inline51 is written tex2html_wrap_inline53 and pronounced tex2html_wrap_inline49 cross tex2html_wrap_inline51 .

Cartesian product for relational algebra is like cartesian product for sets, except that:

Attributes of the result of a cartesian product are named by attaching the name of the argument relation to the name of it's attributes in the result.

Example: the relation scheme of customer tex2html_wrap_inline59 purchased is:

(customer.name, customer.address, customer.cust-num, purchased.cust-num, purchased.flyrod-stock-num, purchased.date)

If an attribute appears in only one relation scheme, we drop the relation name prefix, yielding:

(name, address, customer.cust-num, purchased.cust-num, flyrod-stock-num, date)

Formally, for relations tex2html_wrap_inline61 and tex2html_wrap_inline63 , the scheme of tex2html_wrap_inline53 is produced by concatenating tex2html_wrap_inline67 and tex2html_wrap_inline69 , and then prefixing any duplicate attribute names with the name of the relation that the attribute came from.

Tuples of tex2html_wrap_inline53 are produced by every possible concatenation of a tuple in tex2html_wrap_inline49 with a tuple in tex2html_wrap_inline51 .

Example - customer tex2html_wrap_inline59 purchased is:

name address customer.cust-num purchased.cust-num flyrod-stock-num date
Tim Wahls E258 1 1 3 9/4/1994
Tim Wahls E258 1 3 3 9/3/1994
Tim Wahls E258 1 2 2 8/8/1992
Tim Wahls E258 1 3 2 9/2/1994
Tim Wahls E258 1 3 1 9/1/1994
Linda Null E258 3 1 3 9/4/1994
Linda Null E258 3 3 3 9/3/1994
Linda Null E258 3 2 2 8/8/1992
Linda Null E258 3 3 2 9/2/1994
Linda Null E258 3 3 1 9/1/1994
Elvis Presley Graceland 2 1 3 9/4/1994
Elvis Presley Graceland 2 3 3 9/3/1994
Elvis Presley Graceland 2 2 2 8/8/1992
Elvis Presley Graceland 2 3 2 9/2/1994
Elvis Presley Graceland 2 3 1 9/1/1994

The size of tex2html_wrap_inline53 is tex2html_wrap_inline81 , where tex2html_wrap_inline83 is the size of tex2html_wrap_inline49 .

Discussion: which tuples in the result of customer tex2html_wrap_inline59 purchased represent meaningful relationships?

Answer: Only those where the cust-num attributes contain the same value. Other tuples match customers with flyrods that they didn't buy.

We restrict the result to only the meaningful tuples with an appropriate selection:

displaymath43

which produces:

name address customer.cust-num purchased.cust-num flyrod-stock-num date
Tim Wahls E258 1 1 3 9/4/1994
Linda Null E258 3 3 3 9/3/1994
Linda Null E258 3 3 2 9/2/1994
Linda Null E258 3 3 1 9/1/1994
Elvis Presley Graceland 2 2 2 8/8/1992

Finally, use a project to get rid of one of the duplicate columns:

tex2html_wrap_inline89

producing:

name address customer.cust-num flyrod-stock-num date
Tim Wahls E258 1 3 9/4/1994
Linda Null E258 3 3 9/3/1994
Linda Null E258 3 2 9/2/1994
Linda Null E258 3 1 9/1/1994
Elvis Presley Graceland 2 2 8/8/1992

Example: find all dates on which Linda Null purchased a flyrod.

displaymath44

which produces:

date
9/3/1994
9/2/1994
9/1/1994

A composition of selections is equivalent to a single selection which has a selection predicate built by conjoining (``anding") all of the original selection predicates.

Example:

displaymath45

is equivalent to the previous query.

Multiple applications of cartesian product can be used to answer queries involving more than two relations.

Example - List all manufacturers of flyrods purchased by Elvis Presley:

tex2html_wrap_inline91
tex2html_wrap_inline93

which produces:

manufacturer
Orvis

Return to Table of Contents

The Rename Operation

Discussion - can we write the query: ``Find all customers who have the same address as Tim Wahls" in the subset of relational algebra discussed so far?

We need to reference the customer relation twice in the same query:

displaymath26

Discussion: what is wrong with this query?

The outermost selection predicate is ambiguous, because it is being applied to a relation with two customer.address attributes.

Solution: we need an operation to change the name of a relation. If we could change the name of one occurrence of customer, the ambiguity would be resolved.

Notation: the rename operation is symbolized by the Greek letter rho ( tex2html_wrap_inline27 ), and an application of this operation is written: tex2html_wrap_inline29 . The result is a relation with the same attributes and tuples as r, but with name x.

Example - ``Find the names of all customers who have the same address as Tim Wahls":

displaymath27

producing:

name
Tim Wahls
Linda Null

Return to Table of Contents

The Union Operation

The flyshop database can be expanded with an employee relation that records information about employees of the flyshop.

employee
name address social-security-num salary
Elvis Presley Graceland 121212121 25000
Debbie Jackson Neverland 343434343 32000
Tito Jackson P.O. Box 37 232323232 1000

Employee-scheme = (name, address, social-security-num, salary)

Suppose the flyshop wants to send a newsletter to all employees and customers. We need the names and addresses of both customers and employees in one relation.

Since relations are sets of tuples, unioning the relations seems sensible. However, the relations are on different schemes (and so have different types).

Solution: use appropriate projections.

displaymath22

which produces:

name address
Elvis Presley Graceland
Debbie Jackson Neverland
Linda Null E258
Tito Jackson P.O. Box 37
Tim Wahls E258

Note that the standard set union symbol is used.

Discussion - should Elvis appear twice?

Two relations tex2html_wrap_inline26 and tex2html_wrap_inline28 can be unioned if they are compatible. Compatibility means:

  1. Relations tex2html_wrap_inline26 and tex2html_wrap_inline28 must have the same number of attributes (the same arity).
  2. The domain of each attribute of tex2html_wrap_inline26 must be the same as the domain of the corresponding attribute of tex2html_wrap_inline28 (or at least the domain of one must be a subset of the domain of the other). To make ``correspondence'' of attributes well defined, we require order of attributes to be significant, so that the ith attribute of tex2html_wrap_inline26 corresponds to the ith attribute of tex2html_wrap_inline28 .

The order of attributes of a relation can be changed using projection because the order of attributes in the result relation is the same as the order given in the subscript of tex2html_wrap_inline46 .

Example - to reverse the order of attributes in the newsletter relation:

displaymath23

The attributes of the result of applying the union operation can be named in any convenient fashion.

Return to Table of Contents

The Set Difference Operation

To find all customers who are not also employees, use the set difference operation.

Review: the difference of two sets S1 and S2, written S1 - S2, is the set of all elements of S1 that are not also elements of S2.

As with union, the arguments of set difference must be compatible relations, and the attribute names of the result are arbitrary.

Example:

producing:

name address
Linda Null E258
Tim Wahls E258

Return to Table of Contents

Inductive Definition of Relational Algebra Expressions

An inductive definition is recursive - it defines something in terms of itself.

Example - an inductive definition of arithmetic terms:

basis: a real number is an arithmetic term
inductive definition: if tex2html_wrap_inline11 and tex2html_wrap_inline13 are arithmetic terms, then all of the following are arithmetic terms:

tex2html_wrap_inline15
tex2html_wrap_inline17
tex2html_wrap_inline19
tex2html_wrap_inline21
tex2html_wrap_inline23

By applying this definition, we can generate an infinite number of arithmetic terms. For example, 3, 4, and 5.2 are terms (by the basis) and so are:

3 + 4
(3 + 4)
(3 + 4)/5.2

An inductive definition of relational algebra expressions:

basis: the names of relations in the database and constant relations (entire tables appearing in queries) are relational algebra expressions

inductive definition: if tex2html_wrap_inline31 and tex2html_wrap_inline33 are relational algebra expressions, then all of the following are relational algebra expressions:

tex2html_wrap_inline35
tex2html_wrap_inline37
tex2html_wrap_inline39
tex2html_wrap_inline41 where P is a predicate on the attributes of tex2html_wrap_inline31
tex2html_wrap_inline47 where S is a list of attribute names of tex2html_wrap_inline31
tex2html_wrap_inline53 where x is a string containing the new name for tex2html_wrap_inline31
tex2html_wrap_inline59

Return to Table of Contents

Additional (Non-Fundamental) Operations of Relational Algebra

The following operations can be defined in terms of the fundamental operations, but often make writing queries easier.

The Set Intersection Operation

Example: to find the name and address of everyone who is both an employee and a customer of the flyshop:

displaymath22

It is easier to write:

displaymath23

Return to Table of Contents

The Natural Join Operation

The result of customer tex2html_wrap_inline40 purchased puts customers together with purchases they didn't make. Both a select and a project were needed to make the result useful.

The natural join operation combines a cartesian product operation with these selects and projects. The natural join of customer and purchased, written customer tex2html_wrap_inline42 purchased, is computed by:

  1. taking the cartesian product of customer and purchased
  2. selecting tuples where customer.cust-num = purchased.cust-num
  3. removing one of the cust-num columns

Producing:

name address cust-num flyrod-stock-num date
Tim Wahls E258 1 3 9/4/1994
Linda Null E258 3 3 9/3/1994
Linda Null E258 3 2 9/2/1994
Linda Null E258 3 1 9/1/1994
Elvis Presley Graceland 2 2 8/8/1992

In general, to compute r tex2html_wrap_inline42 s:

  1. take the cartesian product of r and s
  2. select tuples that agree on attributes with the same name
  3. remove any duplicate columns

Natural join is formally defined as follows: Let R and S be relation schemes, and r(R) and s(S). Think of R and S as sets, rather than lists of attribute names. R tex2html_wrap_inline58 S is the set of attribute names that appear in both schemes, and R tex2html_wrap_inline60 S is the scheme of r tex2html_wrap_inline42 s. The tuples in the result are:

r tex2html_wrap_inline42 tex2html_wrap_inline72

where R tex2html_wrap_inline58 S = tex2html_wrap_inline76

The selection refers to tex2html_wrap_inline78 , tex2html_wrap_inline80 and so on, but the projection refers to tex2html_wrap_inline82 , tex2html_wrap_inline84 , ... Technically, this isn't a sensible relational algebra expression, but the idea is that tex2html_wrap_inline82 could be either tex2html_wrap_inline78 or tex2html_wrap_inline80 , because the columns are identical after the selection.

If R tex2html_wrap_inline58 S = tex2html_wrap_inline94 (i.e. the relation schemes have no attributes in common), then for r(R) and s(S):

r tex2html_wrap_inline42 tex2html_wrap_inline104

Example - find all information about customers and their flyrod purchases:

tex2html_wrap_inline106 tex2html_wrap_inline42 tex2html_wrap_inline110 tex2html_wrap_inline42 tex2html_wrap_inline114

Because tex2html_wrap_inline42 is associative, we don't need parentheses. I.e:

tex2html_wrap_inline118 tex2html_wrap_inline42 tex2html_wrap_inline122 tex2html_wrap_inline42 tex2html_wrap_inline126 tex2html_wrap_inline42 tex2html_wrap_inline130 tex2html_wrap_inline42 tex2html_wrap_inline134

The result of this query is:

name address cust-num flyrod-stock-num date manufacturer length line-weight
Tim Wahls E258 1 3 9/4/1994 Sage 7.0 2
Linda Null E258 3 3 9/3/1994 Sage 7.0 2
Linda Null E258 3 2 9/2/1994 Orvis 9.5 9
Linda Null E258 3 1 9/1/1994 G. Loomis 8.5 6
Elvis Presley Graceland 2 2 8/8/1992 Orvis 9.5 9

Example - find the names of all customers who have requested that the flyshop carry some size or color of woolly bugger:

tex2html_wrap_inline136 tex2html_wrap_inline42 tex2html_wrap_inline140 tex2html_wrap_inline42 tex2html_wrap_inline144

producing:

name
Linda Null
Elvis Presley

Note that the scheme of tex2html_wrap_inline106 tex2html_wrap_inline42 tex2html_wrap_inline140 tex2html_wrap_inline42 tex2html_wrap_inline154 is (name, address, cust-num, fly-stock-num, pattern, size, color).

The query:

tex2html_wrap_inline156 tex2html_wrap_inline42 tex2html_wrap_inline140 tex2html_wrap_inline42 tex2html_wrap_inline164

is equivalent and more efficient to execute, because the only tuples from fly participating in the natural join are those whose pattern attribute is ``woolly bugger'', and so the relation resulting from the natural joins is smaller.

Return to Table of Contents

The Division Operation

Suppose we wanted to find the cust-num of all customers who have purchased at least one of every kind of flyrod that the flyshop stocks.

One approach:

  1. find the flyrod-stock-num for all flyrods that we stock:

    displaymath24

    We'll call the result flyrod-nums.

    flyrod-nums
    flyrod-stock-num
    1
    2
    3

  2. find the cust-num and flyrod-stock-num involved in each purchase:

    displaymath25

    We'll call the result has-purchased.

    has-purchased
    cust-num flyrod-stock-num
    1 3
    3 3
    3 2
    3 1
    2 2

  3. find each cust-num that appears in a tuple of has-purchased with every tuple (value) in flyrod-nums. This is exactly the definition of the division operation, written tex2html_wrap_inline30 .

Thus, this query would be written:

displaymath26

producing:

cust-num
3

Formally, the result of tex2html_wrap_inline32 for r(R) and s(S) is a relation on scheme R - S, where we require that S tex2html_wrap_inline38 R.

A tuple t is in tex2html_wrap_inline32 if (and only if) the following condition is satisfied: For every tuple tex2html_wrap_inline44 in s there is a tuple tex2html_wrap_inline48 in r such that:

tex2html_wrap_inline52
and tex2html_wrap_inline54

i.e. there is a group of tuples in r that all have the same values for attributes in R - S, and have all values (subtuples) that occur in s for attributes in S. Every tuple in this group has the same values for attributes in R - S, and so the group contributes one tuple to the result of tex2html_wrap_inline32 , because all tuples in the group are the same when restricted to attributes in R - S.

The division operator can be expressed in terms of fundamental operations, but this definition doesn't help understand what division means.

Return to Table of Contents

The Assignment Operation

For complicated relational algebra expressions, it is convenient to use temporary relations to hold the results of some subexpressions.

Notation: assignment is written tex2html_wrap_inline12

Example - to compute customer tex2html_wrap_inline14 purchased:

temp tex2html_wrap_inline12 customer tex2html_wrap_inline18 purchased
temp tex2html_wrap_inline12 tex2html_wrap_inline22
temp tex2html_wrap_inline12 tex2html_wrap_inline26

Now temp contains customer tex2html_wrap_inline14 purchased.

Because non-fundamental operations don't permit any queries to be expressed that can't be expressed using only fundamental operations, they don't add any expressive power to relational algebra. However, they do add to the expressiveness of relational algebra because they make writing queries easier.

Return to Table of Contents