SQL (Structured Query Language) is THE standard DML for relational database products. The query language is based on relational algebra, but borrows from tuple relational calculus.
Topics:
Two major standards have been defined for SQL:
SQL-92 defines some additional operations that are not part of SQL-89.
Typical SQL queries have the form:
select
from
where
P
This query has the same meaning as:
in relational algebra.
Example: Find the name and address of everyone who has purchased a flyrod with flyrod-stock-num 3.
select
name, address
from
customer, purchased
where
customer.cust-num = purchased.cust_num and
flyrod-stock-num = 3
Structure of SQL expressions:
select
clause lists the attributes that will appear
in the result relation. (select
is like project in relational
algebra.) select *
is a shorthand for selecting all attributes
of relations appearing in the from
clause
from
clause lists the relations whose cartesian
product is needed in the query
where
clause is like select in relational algebra,
although it has slightly different syntax. An omitted where
clause is equivalent to where
true.
SQL-89 does not include the natural join operation. Queries must be written
using cartesian product (via the from
clause), selection
(via the where
clause) and projection (via the
select
clause) to implement natural join.
SQL-92 does include natural join (discussed later).
Example: (customer purchased)
in SQL:
select
name, address, customer.cust-num, flyrod-stock-num, date
from
customer, purchased
where
customer.cust-num = purchased.cust-num
Attribute names that appear in more than one relation in the from clause
are distinguished by prefixing them with the relation name in the
from
clause. Other attributes
can be prefixed with the relation name, but it is not necessary.
However, this prefixing is not used in naming attributes of the result
relation. See the section on tuple variables
for information on renaming tuples in the result.
Because removing duplicates is expensive, SQL does not remove duplicates by default. The result of
select
flyrod-stock-num
from
purchased
is:
flyrod-stock-num |
---|
1 |
1 |
2 |
3 |
3 |
Discussion: how would this query be represented in relational algebra? Does relational algebra remove duplicates?
To force SQL to remove duplicates, use the keyword distinct
after select
as follows:
select distinct
flyrod-stock-num
from
purchased
SQL-92 includes the set operations union
( ),
intersection
( ) and except
(-). SQL-89 provides partial
support for these operations, and not all relational database products
include them. These operations can be applied to the results
of queries of the basic form.
Example: Find the names and addresses of all customers and employees of the fly shop.
(select
name, address
from
customer)
union
(select
name, address
from
employee)
As in relational algebra, the argument relations of set operations must be compatible. SQL uses a stronger definition of compatibility that requires the relation schemes of the argument relations to be the same (including attribute names). The scheme of the result is then the same as the schemes of the arguments.
By default, all of the set operations (including union
) remove
duplicates from the result (different from the
default for select
). To retain duplicates, use
the keyword all
after the set operation.
Example - to retain duplicates in the previous query:
(select
name, address
from
customer)
union all
(select
name, address
from
employee)
The syntax select all
is legal, but redundant.
SQL has a notion of tuple variables that range over relations. Tuple
variables can be used to rename relations in much the same way that
the rename operation is used in relational algebra. The keyword
as
is used to associate a tuple variable with a relation.
Example - Find all customers who have the same address as Tim Wahls:
select
T.name, T.address
from
customer as
T, customer as
S
where
S.name = "Tim Wahls" and
T.address = S.address
Keyword as
can also be used to rename attributes in the
result of a query. For example, if we wanted to compute the cartesian
product of customer and purchased and have unique attribute names in
the result:
select
name, address, customer.cust-num, purchased.cust-num
as
p-cust-num, flyrod-stock-num, date
from
customer, purchased
The scheme of the resulting relation is: (name, address, cust-num, p-cust-num, flyrod-stock-num, date)
SQL uses different syntax for predicates (in the where
clause)
than that used in selection predicates in relational algebra:
SQL predicates | relational algebra predicates |
---|---|
P and Q |
P Q |
P or Q |
P Q |
not P |
P |
x between a and b |
(a x) (x b) |
x not between a and b |
((a x) (x b)) |
SQL permits arithmetic operators +, -, *, / in where
clauses:
select
name, salary
from
employee
where
(salary / 2) > (2 + 50)
SQL has facilities for pattern matching on strings using keyword
like
and wild card characters % (any string) and _
(any character).
Example - Find the names of all customers whose address has an 8 in it:
select
name
from
customer
where
address like
"%8%"
SQL allows the user to specify that tuples in the result of a query
should be sorted on a particular attribute using the order by
clause.
Example - List (in alphabetic order) all manufacturers of flyrods that the flyshop stocks.
select
manufacturer
from
flyrod
order by
manufacturer
Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from shortest to longest.
select
flyrod-stock-num, length
from
flyrod
where
manufacturer = "Sage"
order by
length
The keywords asc
and desc
can be used to control
whether ascending or descending order is used, respectively. The default
is ascending.
Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from longest to shortest.
select
flyrod-stock-num, length
from
flyrod
where
manufacturer = "Sage"
order by
length desc
Multiple arguments to order by
can be used to specify how
to break ties.
Example - List the flyrod-stock-num and length of all flyrods manufactured by Sage, ordered by flyrod length from longest to shortest. If two flyrods have the same length, order them by flyrod-stock-num in ascending order.
select
flyrod-stock-num, length
from
flyrod
where
manufacturer = "Sage"
order by
length desc
, flyrod-stock-num
Aggregate functions allow SQL queries to compute composite information about an entire relation.
function name | meaning |
---|---|
avg | average value of some attribute |
min | minimum value of some attribute |
max | maximum value of some attribute |
sum | sum of all value of some attribute |
count | count of the number of tuples |
Example: Find the average salary of all employees.
select avg(
salary)
from
employee
The result is a 1 attribute, 1 tuple relation containing the average of the
salary attribute. Typically, result relations of queries that use
aggregate functions do not have named attributes. The keyword as
can be used to give names to the attributes if needed.
It is important to retain duplicates when computing averages. For example, if multiple employees have the same salary, the salary of each should still contribute to the average.
Example: Find the maximum salary.
select max(
salary)
from
employee
Note that duplicates do not matter with max
and
min
.
Example: How many customers does the flyshop have?
select count(distinct
cust-num)
from
customer
The use of distinct
is helpful in case the customer relation
contains duplicates.
Example: How many customers have purchased at least one flyrod?
select count(distinct
cust-num)
from
purchased
The use of distinct
is essential in this case, because
we want to count how many different cust-nums occur in the relation.
The result of this query is again a 1 tuple, 1 attribute relation.
Discussion: What would the result of this query be if distinct
were omitted?
Answer: the number of tuples in the purchased relation.
To simply count the number of tuples in a relation, use count(*)
.
Example: How many tuples are in the customer relation?
select count(*)
from
customer
The group by
clause can be used with aggregate functions
to compute information about subsets of a relation.
Example: Find the number of customers living at each customer address.
select
address,
count(distinct
cust-num)
from
customer
group by
address
The result of this query is:
E258 | 2 |
Graceland | 1 |
Each subset of tuples that agree (have the same value) on all attributes
listed in the group by
clause
forms a group, and each group contributes
one tuple to the result.
Example: find the number of different flyrod models purchased by each customer.
select
cust-num,
count(distinct
flyrod-stock-num)
from
purchased
group by
cust-num
The result of this query is:
1 | 1 |
2 | 1 |
3 | 3 |
The keyword distinct
is necessary here because one customer
may have purchased multiple (identical) flyrods.
Example: find the number of flyrods purchased by each customer.
If we assume that the purchased relation doesn't contain duplicates, the following works.
select
cust-num,
count(
flyrod-stock-num)
from
purchased
group by
cust-num
If we'd prefer to have the names of customers rather than cust-nums:
select
name,
count(
flyrod-stock-num)
from
customer, purchased
where
customer.cust-num = purchased.cust-num
group by
cust-num
The having
clause is similar to the where
clause,
except that conditions in a having
clause apply to groups
created by a group by
clause, rather than individual tuples.
Example: find the number of flyrods purchased by each customer who has purchased at least 3 flyrods.
select
cust-num,
count(
flyrod-stock-num)
from
purchased
group by
cust-num
having count(
flyrod-stock-num)
>= 3
After groups are formed by the group by
clause, groups
satisfying the having
clause are selected, and only those
groups contribute a tuple to the result. The result of this query is:
3 | 3 |
If a query has a where
clause as well as group by
and having
clauses, the selection implied by the
where
clause is performed first, and then groups are formed.
Example: find the name of all customers who have purchased at least 3 flyrods, and the number of flyrods they have purchased.
select
cust-num,
count(
flyrod-stock-num)
from
customer, purchased
where
customer.cust-num = purchased.cust-num
group by
cust-num
having count(
flyrod-stock-num)
>= 3
SQL uses a special value, called the null value, to mean that a particular value isn't known. For example, if the flyshop wants to stock a Winston flyrod but doesn't know the length, the flyrod relation could be changed to:
manufacturer | length | line-weight | flyrod-stock- num |
---|---|---|---|
Sage | 7.0 | 2 | 3 |
G. Loomis | 8.5 | 6 | 1 |
Orvis | 9.5 | 9 | 2 |
Winston | null | 7 | 4 |
Null values complicate comparisons and computations. For example, a query like:
select
manufacturer, flyrod-stock-num
from
flyrod
where
length > 7.0
causes a comparison of 7.0 with the null value in this relation. Such a
comparison has the special value unknown
in SQL-92, but this
value is usually treated as false. Earlier versions of SQL simply define
any comparison involving a null value to be false. In either case, the tuple:
Winston | null | 7 | 4 |
is not in the result of this query.
Any arithmetic expression with a null value as an argument has null as its result. For example, in the query:
select
manufacturer, flyrod-stock-num
from
flyrod
where
length + 8.0 > 7.0
For the tuple where length is null, length + 7.0 evaluates to null, and so
the result of the comparison is unknown
or false. In either
case, this tuple is not in the result of the query, even though every flyrod
has positive length.
All aggregate functions except count(*)
ignore null values. That
is, the tuple with null length does not contribute to the result of:
select
manufacturer, avg(
length)
from
flyrod
group by
manufacturer
The keywords is null
and is not null
can be used
in the where
clause to explicitly test for the presence and
absence, respectively, of null values.
Example - given the previous definition of the flyrod relation, the following query:
select *
from
flyrod
where
length is null
has result:
manufacturer | length | line-weight | flyrod-stock- num |
---|---|---|---|
Winston | null | 7 | 4 |
Note: we'd prefer to avoid null values because of the complications they introduce in querying.
The in
and not in
connectives test the membership
of a tuple in the result of an SQL query. This allows SQL queries to be
nested.
Example - Find the name and address of everyone who has purchased a flyrod with flyrod-stock-num 3:
select distinct
name, address
from
customer
where
cust-num in
(select
cust-num
from
purchased
where
where flyrod-stock-num = 3)
Tuples can be constructed using (). That is, (v1, v2, ..., vn) is a tuple of arity n containing values v1, v2, ..., vn.
Example - the previous query can be expressed as:
select distinct
name, address
from
customer
where
(cust-num, 3) in
(select
cust-num, flyrod-stock-num
from
purchased)
Nested queries can be used with a comparison operator and keywords
some
or all
to compare an attribute against
all values appearing in some other relation.
Example - find all flyrods having maximum length:
select *
from
flyrod
where
length >= all
(select
length
from
flyrod)
This query returns all tuples whose length attribute is >= the value of the length attribute of all tuples in flyrod.
Example - find all flyrods that have length greater than the minimum length of a flyrod.
select *
from
flyrod
where
length > some
(select
length
from
flyrod)
This query returns all tuples whose length attribute is > the value of the length attribute of at least one tuple in flyrod.
The keywords some
and all
can be used with any
of the following comparison operators: < <=, > >=, =, and
<> (inequality).
The keywords exists
and not exists
are used
to test whether or not the result of a nested query is an empty relation
(a relation containing no tuples).
Example - Find all flyrod models that no one has purchased.
select *
from
flyrod
where not exists
(select *
from
purchased
where
flyrod.flyrod-stock-num = purchased.flyrod-stock-num)
The keywords unique
and not unique
can be used
to test whether or not the result of a subquery contains duplicate tuples.
Example - Find the names of all customers who have purchased more than 1 of the same make and model of flyrod (i.e. who have purchased 2 or more identical flyrods).
select
name
from
customer
where not unique
(select
flyrod-stock-num
from
purchased
where
purchased.cust-num = customer.cust-num)
If a customer has purchased multiple identical flyrods, there will be duplicate values of flyrod-stock-num in the result of the subquery.
SQL-92 permits subqueries to be used in the from
clause of
another query. The relation resulting from this subquery is referred to
as a derived relation. Such a relation must be renamed (using
the keyword as
).
Example - find the maximum number of flyrods purchased by a customer and the cust-num of that customer.
select
cust-num, max(
num-flyrods)
from
(select
cust-num,
count(
flyrod-stock-num)
from
purchased
group by
cust-num) as
result(cust-num, num-flyrods)
Example - find the maximum number of flyrods purchased by a customer and the name of that customer.
select
name, max(
num-flyrods)
from (select
cust-num,
count(
flyrod-stock-num)
from
purchased
group by
cust-num) as
result(cust-num, num-flyrods),
customer
where
result.cust-num = customer.cust-num
The expressive power of a query language refers to the set of queries that can be formulated in the language. Informally, it characterizes what questions can be asked.
All of the fundamental operations of relational algebra (select, project, cartesian product, rename, union and set difference) can be expressed in SQL. Hence, SQL has at least as much expressive power as relational algebra.
The aggregate functions of SQL can not be expressed in relational algebra (at least under the standard definition of relational algebra). Hence, SQL has strictly more expressive power than relational algebra. However, SQL still can not express all possible queries, unless it is embedded in some programming language.
Queries that can not be expressed in SQL include: