Course Notes for Comp 419

SQL

(read chapter 4)

SQL (Structured Query Language) is THE standard DML for relational database products. The query language is based on relational algebra, but borrows from tuple relational calculus.

Topics:

the query language
modification of relations (insert, delete, update)
creation and deletion of relations
creation and use of views

Two major standards have been defined for SQL:

SQL-89
SQL-92

SQL-92 defines some additional operations that are not part of SQL-89.

Querying in SQL

Typical SQL queries have the form:

select
from
where P

This query has the same meaning as:

in relational algebra.

Example: Find the name and address of everyone who has purchased a flyrod with flyrod-stock-num 3.

select name, address
from customer, purchased
where customer.cust-num = purchased.cust_num and flyrod-stock-num = 3

Structure of SQL expressions:

the select clause lists the attributes that will appear in the result relation. (select is like project in relational algebra.) select * is a shorthand for selecting all attributes of relations appearing in the from clause
the from clause lists the relations whose cartesian product is needed in the query
the where clause is like select in relational algebra, although it has slightly different syntax. An omitted where clause is equivalent to where true.

The result of an SQL query is a relation.

Return to Table of Contents

SQL-89 does not include the natural join operation. Queries must be written using cartesian product (via the from clause), selection (via the where clause) and projection (via the select clause) to implement natural join. SQL-92 does include natural join (discussed later).

Example: (customer purchased)

in SQL:

select name, address, customer.cust-num, flyrod-stock-num, date
from customer, purchased
where customer.cust-num = purchased.cust-num

Attribute names that appear in more than one relation in the from clause are distinguished by prefixing them with the relation name in the from clause. Other attributes can be prefixed with the relation name, but it is not necessary. However, this prefixing is not used in naming attributes of the result relation. See the section on tuple variables for information on renaming tuples in the result.

Return to Table of Contents

Set Operations and Duplicates

Because removing duplicates is expensive, SQL does not remove duplicates by default. The result of

select flyrod-stock-num
from purchased

is:

flyrod-stock-num

1

1

2

3

3

flyrod-stock-num
1
1
2
3
3

Discussion: how would this query be represented in relational algebra? Does relational algebra remove duplicates?

To force SQL to remove duplicates, use the keyword distinct after select as follows:

select distinct flyrod-stock-num
from purchased

SQL-92 includes the set operations union ( ), intersection ( ) and except (-). SQL-89 provides partial support for these operations, and not all relational database products include them. These operations can be applied to the results of queries of the basic form.

Example: Find the names and addresses of all customers and employees of the fly shop.

(select name, address
from customer)
union (select name, address
from employee)

As in relational algebra, the argument relations of set operations must be compatible. SQL uses a stronger definition of compatibility that requires the relation schemes of the argument relations to be the same (including attribute names). The scheme of the result is then the same as the schemes of the arguments.

By default, all of the set operations (including union) remove duplicates from the result (different from the default for select). To retain duplicates, use the keyword all after the set operation.

Example - to retain duplicates in the previous query:

(select name, address
from customer)
union all (select name, address
from employee)

The syntax select all is legal, but redundant.

Return to Table of Contents

Tuple Variables

SQL has a notion of tuple variables that range over relations. Tuple variables can be used to rename relations in much the same way that the rename operation is used in relational algebra. The keyword as is used to associate a tuple variable with a relation.

Example - Find all customers who have the same address as Tim Wahls:

select T.name, T.address
from customer as T, customer as S
where S.name = "Tim Wahls" and T.address = S.address

Keyword as can also be used to rename attributes in the result of a query. For example, if we wanted to compute the cartesian product of customer and purchased and have unique attribute names in the result:

select name, address, customer.cust-num, purchased.cust-num as p-cust-num, flyrod-stock-num, date
from customer, purchased

The scheme of the resulting relation is: (name, address, cust-num, p-cust-num, flyrod-stock-num, date)

Return to Table of Contents

Predicates

SQL uses different syntax for predicates (in the where clause) than that used in selection predicates in relational algebra:

SQL predicates relational algebra predicates

P and Q P Q

P or Q P Q

not P P

x between a and b (a x) (x b)

x not between a and b ((a x) (x b))

SQL predicates	relational algebra predicates
P `and` Q	P Q
P `or` Q	P Q
`not` P	P
x `between` a `and` b	(a x) (x b)
x `not between` a `and` b	((a x) (x b))

SQL permits arithmetic operators +, -, *, / in where clauses:

select name, salary
from employee
where (salary / 2) > (2 + 50)

SQL has facilities for pattern matching on strings using keyword like and wild card characters % (any string) and _ (any character).

Example - Find the names of all customers whose address has an 8 in it:

select name
from customer
where address like "%8%"

Return to Table of Contents

Ordering Tuples

SQL allows the user to specify that tuples in the result of a query should be sorted on a particular attribute using the order by clause.

Example - List (in alphabetic order) all manufacturers of flyrods that the flyshop stocks.

select manufacturer
from flyrod
order by manufacturer

Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from shortest to longest.

select flyrod-stock-num, length
from flyrod
where manufacturer = "Sage"
order by length

The keywords asc and desc can be used to control whether ascending or descending order is used, respectively. The default is ascending.

Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from longest to shortest.

select flyrod-stock-num, length
from flyrod
where manufacturer = "Sage"
order by length desc

Multiple arguments to order by can be used to specify how to break ties.

Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from longest to shortest. If two flyrods have the same length, order them by flyrod-stock-num in ascending order.

select flyrod-stock-num, length
from flyrod
where manufacturer = "Sage"
order by length desc, flyrod-stock-num

Return to Table of Contents

Aggregate Functions

Aggregate functions allow SQL queries to compute composite information about an entire relation.

Aggregate Functions in SQL
function name meaning

avg average value of some attribute

min minimum value of some attribute

max maximum value of some attribute

sum sum of all value of some attribute

count count of the number of tuples

Aggregate Functions in SQL
function name	meaning
`avg`	average value of some attribute
`min`	minimum value of some attribute
`max`	maximum value of some attribute
`sum`	sum of all value of some attribute
`count`	count of the number of tuples

Example: Find the average salary of all employees.

select avg(salary)
from employee

The result is a 1 attribute, 1 tuple relation containing the average of the salary attribute. Typically, result relations of queries that use aggregate functions do not have named attributes. The keyword as can be used to give names to the attributes if needed.

It is important to retain duplicates when computing averages. For example, if multiple employees have the same salary, the salary of each should still contribute to the average.

Example: Find the maximum salary.

select max(salary)
from employee

Note that duplicates do not matter with max and min.

Example: How many customers does the flyshop have?

select count(distinct cust-num)
from customer

The use of distinct is helpful in case the customer relation contains duplicates.

Example: How many customers have purchased at least one flyrod?

select count(distinct cust-num)
from purchased

The use of distinct is essential in this case, because we want to count how many different cust-nums occur in the relation. The result of this query is again a 1 tuple, 1 attribute relation.

Discussion: What would the result of this query be if distinct were omitted?

Answer: the number of tuples in the purchased relation.

To simply count the number of tuples in a relation, use count(*).

Example: How many tuples are in the customer relation?

select count(*)
from customer

The group by clause can be used with aggregate functions to compute information about subsets of a relation.

Example: Find the number of customers living at each customer address.

select address, count(distinctcust-num)
from customer
group by address

The result of this query is:

E258 2

Graceland 1

Each subset of tuples that agree (have the same value) on all attributes listed in the group by clause forms a group, and each group contributes one tuple to the result.

Example: find the number of different flyrod models purchased by each customer.

select cust-num, count(distinct flyrod-stock-num)
from purchased
group by cust-num

The result of this query is:

1 1

2 1

3 3

The keyword distinct is necessary here because one customer may have purchased multiple (identical) flyrods.

Example: find the number of flyrods purchased by each customer.

If we assume that the purchased relation doesn't contain duplicates, the following works.

select cust-num, count(flyrod-stock-num)
from purchased
group by cust-num

If we'd prefer to have the names of customers rather than cust-nums:

select name, count(flyrod-stock-num)
from customer, purchased
where customer.cust-num = purchased.cust-num
group by cust-num

The having clause is similar to the where clause, except that conditions in a having clause apply to groups created by a group by clause, rather than individual tuples.

Example: find the number of flyrods purchased by each customer who has purchased at least 3 flyrods.

select cust-num, count(flyrod-stock-num)
from purchased
group by cust-num
having count(flyrod-stock-num) >= 3

After groups are formed by the group by clause, groups satisfying the having clause are selected, and only those groups contribute a tuple to the result. The result of this query is:

3 3

If a query has a where clause as well as group by and having clauses, the selection implied by the where clause is performed first, and then groups are formed.

Example: find the name of all customers who have purchased at least 3 flyrods, and the number of flyrods they have purchased.

select cust-num, count(flyrod-stock-num)
from customer, purchased
where customer.cust-num = purchased.cust-num
group by cust-num
having count(flyrod-stock-num) >= 3

Return to Table of Contents

Null Values

SQL uses a special value, called the null value, to mean that a particular value isn't known. For example, if the flyshop wants to stock a Winston flyrod but doesn't know the length, the flyrod relation could be changed to:

flyrod
manufacturer length line-weight flyrod-stock- num

Sage 7.0 2 3

G. Loomis 8.5 6 1

Orvis 9.5 9 2

Winston null 7 4

flyrod
manufacturer	length	line-weight	flyrod-stock- num
Sage	7.0	2	3
G. Loomis	8.5	6	1
Orvis	9.5	9	2
Winston	null	7	4

Null values complicate comparisons and computations. For example, a query like:

select manufacturer, flyrod-stock-num
from flyrod
where length > 7.0

causes a comparison of 7.0 with the null value in this relation. Such a comparison has the special value unknown in SQL-92, but this value is usually treated as false. Earlier versions of SQL simply define any comparison involving a null value to be false. In either case, the tuple:

Winston null 7 4

is not in the result of this query.

Any arithmetic expression with a null value as an argument has null as its result. For example, in the query:

select manufacturer, flyrod-stock-num
from flyrod
where length + 8.0 > 7.0

For the tuple where length is null, length + 7.0 evaluates to null, and so the result of the comparison is unknown or false. In either case, this tuple is not in the result of the query, even though every flyrod has positive length.

All aggregate functions except count(*) ignore null values. That is, the tuple with null length does not contribute to the result of:

select manufacturer, avg(length)
from flyrod
group by manufacturer

The keywords is null and is not null can be used in the where clause to explicitly test for the presence and absence, respectively, of null values.

Example - given the previous definition of the flyrod relation, the following query:

select *
from flyrod
where length is null

has result:

manufacturer length line-weight flyrod-stock- num

Winston null 7 4

manufacturer	length	line-weight	flyrod-stock- num
Winston	null	7	4

Note: we'd prefer to avoid null values because of the complications they introduce in querying.

Return to Table of Contents

Nested Queries

The in and not in connectives test the membership of a tuple in the result of an SQL query. This allows SQL queries to be nested.

Example - Find the name and address of everyone who has purchased a flyrod with flyrod-stock-num 3:

select distinct name, address
from customer
where cust-num in
(select cust-num
from purchased
where where flyrod-stock-num = 3)

Tuples can be constructed using (). That is, (v1, v2, ..., vn) is a tuple of arity n containing values v1, v2, ..., vn.

Example - the previous query can be expressed as:

select distinct name, address
from customer
where (cust-num, 3) in
(select cust-num, flyrod-stock-num
from purchased)

Nested queries can be used with a comparison operator and keywords some or all to compare an attribute against all values appearing in some other relation.

Example - find all flyrods having maximum length:

select *
from flyrod
where length >= all
(select length
from flyrod)

This query returns all tuples whose length attribute is >= the value of the length attribute of all tuples in flyrod.

Example - find all flyrods that have length greater than the minimum length of a flyrod.

select *
from flyrod
where length > some
(select length
from flyrod)

This query returns all tuples whose length attribute is > the value of the length attribute of at least one tuple in flyrod.

The keywords some and all can be used with any of the following comparison operators: < <=, > >=, =, and <> (inequality).

The keywords exists and not exists are used to test whether or not the result of a nested query is an empty relation (a relation containing no tuples).

Example - Find all flyrod models that no one has purchased.

select *
from flyrod
where not exists
(select *
from purchased
where flyrod.flyrod-stock-num = purchased.flyrod-stock-num)

The keywords unique and not unique can be used to test whether or not the result of a subquery contains duplicate tuples.

Example - Find the names of all customers who have purchased more than 1 of the same make and model of flyrod (i.e. who have purchased 2 or more identical flyrods).

select name
from customer
where not unique
(select flyrod-stock-num
from purchased
where purchased.cust-num = customer.cust-num)

If a customer has purchased multiple identical flyrods, there will be duplicate values of flyrod-stock-num in the result of the subquery.

SQL-92 permits subqueries to be used in the from clause of another query. The relation resulting from this subquery is referred to as a derived relation. Such a relation must be renamed (using the keyword as).

Example - find the maximum number of flyrods purchased by a customer and the cust-num of that customer.

select cust-num, max(num-flyrods)
from (select cust-num, count(flyrod-stock-num)
from purchased
group by cust-num) as result(cust-num, num-flyrods)

Example - find the maximum number of flyrods purchased by a customer and the name of that customer.

select name, max(num-flyrods)
from (select cust-num, count(flyrod-stock-num)
from purchased
group by cust-num) as result(cust-num, num-flyrods), customer
where result.cust-num = customer.cust-num

Return to Table of Contents

Expressive Power of SQL

The expressive power of a query language refers to the set of queries that can be formulated in the language. Informally, it characterizes what questions can be asked.

All of the fundamental operations of relational algebra (select, project, cartesian product, rename, union and set difference) can be expressed in SQL. Hence, SQL has at least as much expressive power as relational algebra.

The aggregate functions of SQL can not be expressed in relational algebra (at least under the standard definition of relational algebra). Hence, SQL has strictly more expressive power than relational algebra. However, SQL still can not express all possible queries, unless it is embedded in some programming language.

Queries that can not be expressed in SQL include:

powerset - find all subsets of a relation
transitive closure - find all relationships implied by those present in the database

Return to Table of Contents