Normalization Using Functional Dependencies

Functional dependencies are useful for finding normal forms, which are criteria for good database designs. The process of putting a database design into a normal form is called normalization.

Boyce-Codd Normal Form

Boyce-Codd normal form (BCNF) is a very desirable form for relational database schemes because a relational database scheme in BCNF has minimal redundancy.

A relation scheme R is in BCNF with respect to a set of functional dependencies F if for all functional dependencies tex2html_wrap_inline42 in tex2html_wrap_inline44 where tex2html_wrap_inline46 R and tex2html_wrap_inline48 R (i.e. all functional dependencies in the restriction of tex2html_wrap_inline44 to R), at least one of the following holds:

A database design is in BCNF if each relation scheme of the design is in BCNF.

Examples:

Consider Customer-scheme, Purchased-scheme, and Flyrod-scheme from our running example. F = {cust-num tex2html_wrap_inline60 {name, address}, flyrod-stock-num tex2html_wrap_inline60 {manufacturer, length, line-weight}}. To check that this design is in BCNF:

Thus, the database design is in BCNF.

Another example:

Let Customer-scheme = {cust-num, name, house-num, street, city, state, zip}, and F = {cust-num tex2html_wrap_inline60 {name, house-num, street, city, state, zip}, {house-num, street, city, state} tex2html_wrap_inline60 zip, zip tex2html_wrap_inline60 state}.

Is this scheme in BCNF?

No, because {house-num, street, city, state} is not a superkey for Customer-scheme, and {house-num, street, city, state} tex2html_wrap_inline60 zip is not trivial. A similar situation exists for zip tex2html_wrap_inline60 state.

This design has the potential for repition, because zip will always be the same if the rest of the address is the same, and the state will always be the same if the zip is the same. One way to remove this redundancy is to decompose relation schemes that are not in BCNF.

The BCNF Decomposition Algorithm:

inputs: a relation scheme R 
        and a set of functional dependencies F 
output: a lossless-join decomposition of R
        that is in BCNF
result := {R} compute tex2html_wrap_inline44 while there is a relation scheme tex2html_wrap_inline90 in result that is not in BCNF do choose tex2html_wrap_inline92 such that tex2html_wrap_inline42 is on tex2html_wrap_inline90 and tex2html_wrap_inline98 = {} and tex2html_wrap_inline56 is not a superkey for tex2html_wrap_inline90 result := (result - tex2html_wrap_inline90) tex2html_wrap_inline102 (tex2html_wrap_inline104) tex2html_wrap_inline102 (tex2html_wrap_inline108) end

To see that result is a lossless-join decomposition, note that tex2html_wrap_inline42 and ( tex2html_wrap_inline104 ) tex2html_wrap_inline114 ( tex2html_wrap_inline108 ) is tex2html_wrap_inline56 . Clearly, tex2html_wrap_inline120 ( tex2html_wrap_inline108 ).

To use this algorithm to decompose the modified Customer-scheme:

initially, result := {cust-num, name, house-num, street, city, state, zip}
choose zip tex2html_wrap_inline60 state
The resulting decomposition is {{cust-num, name, house-num, street, city, zip}, {zip, state}}. Let tex2html_wrap_inline126 = {cust-num, name, house-num, street, city, zip} and tex2html_wrap_inline128 = {zip, state}

Hence, this decomposition is in BCNF.

If we choose to apply {house-num, street, city, state} tex2html_wrap_inline60 zip first, the first decomposition we consider is:

{{cust-num, name, house-num, street, city, state}, {house-num, street, city, state, zip}}.

Because zip tex2html_wrap_inline60 state is still on the second scheme in the decomposition, it is not in BCNF. Hence, we decompose it again to produce:

{{cust-num, name, house-num, street, city, state}, {house-num, street, city, zip}, {zip, state}}.

This scheme is in BCNF. In this case, we would probably prefer the first decomposition (or we may even decide the price of avoiding the minor redundancy present here is too high and abandon BCNF).

In both of these decompositions, the functional dependency cust-num tex2html_wrap_inline60 {name, house-num, street, city, state, zip} is not in the restriction of F to the decomposition (F'). Are these decompositions dependency preserving?

Proof: (that the first decomposition is not dependency preserving)

We need to show: tex2html_wrap_inline156 tex2html_wrap_inline158 tex2html_wrap_inline44

F' = {{zip tex2html_wrap_inline60 state}}, and so tex2html_wrap_inline166 in F' is just {cust-num}, while tex2html_wrap_inline166 in F is {cust-num, name, house-num, street, city, state, zip}. Hence, tex2html_wrap_inline156 tex2html_wrap_inline158 tex2html_wrap_inline44 .

In fact, neither decomposition is dependency preserving.

In relational database design, the usual goal is to achieve all of:

Clearly, this goal can not always be acheived, because some relation schemes can not be put into BCNF without giving up dependency preservation. In such a situation, we often prefer a weaker normal form that keeps dependency preservation.

Return to Table of Contents

Third Normal Form

A relation scheme R is in third normal form (3NF) with respect to a set of functional dependencies F if for all functional dependencies tex2html_wrap_inline42 in tex2html_wrap_inline44 that are in the restriction of tex2html_wrap_inline44 to R, at least one of the following holds:

A relational database scheme is in 3NF when all relation schemes in the database scheme are in 3NF.

Note that the first two conditions are the same as for BCNF. Hence, every relation scheme that is in BCNF is also in 3NF. The reverse is not true, because the third condition permits some schemes that are not in BCNF.

Example: consider the decomposition:

{{cust-num, name, house-num, street, city, state}, {house-num, street, city, state, zip}}

with F = {cust-num tex2html_wrap_inline60 {name, house-num, street, city, state, zip}, {house-num, street, city, state} tex2html_wrap_inline60 zip, zip tex2html_wrap_inline60 state}.

Is this decomposition in 3NF?

Characteristics of this decomposition:

The 3NF Decomposition Algorithm:

inputs: A set of functional dependencies F and a relation scheme R
output: a lossless join, dependency preserving decomposition in 3NF

  1. compute tex2html_wrap_inline240 (a canonical cover of F)
  2. for each functional dependency tex2html_wrap_inline42 in tex2html_wrap_inline240 , if tex2html_wrap_inline248 tex2html_wrap_inline250 tex2html_wrap_inline252 for some other functional dependency tex2html_wrap_inline254 in tex2html_wrap_inline240 , then tex2html_wrap_inline248 is a scheme in the decomposition
  3. if none of the schemes found in step 2 contains a candidate key for R, choose any candidate key K and add scheme K to the decomposition
  4. for any attributes A tex2html_wrap_inline192 R not contained in any scheme in the decomposition thus far computed, add {A} to the decomposition

Example: consider the scheme {cust-num, name, house-num, street, city, state, zip} with F = {cust-num tex2html_wrap_inline60 {name, house-num, street, city, state, zip}, {house-num, street, city, state} tex2html_wrap_inline60 zip, zip tex2html_wrap_inline60 state}.

One canonical cover is tex2html_wrap_inline240 = {cust-num tex2html_wrap_inline60 {name, house-num, street, city, state}, {house-num, street, city, state} tex2html_wrap_inline60 zip, zip tex2html_wrap_inline60 state}.

All attributes in zip tex2html_wrap_inline60 state occur in {house-num, street, city, state} tex2html_wrap_inline60 zip, so the decomposition found in step two is: {{cust-num, name, house-num, street, city, state}, {house-num, street, city, state, zip}}

The first scheme in the decomposition contains a candidate key (cust-num), and the schemes in the decomposition have all of the attributes of the original relation. Hence, the algorithm is done. Note that this is much easier than checking that a decomposition is in 3NF directly.

For this example, another canonical cover tex2html_wrap_inline240 = {cust-num tex2html_wrap_inline60 {name, house-num, street, city, zip}, {house-num, street, city, state} tex2html_wrap_inline60 zip, zip tex2html_wrap_inline60 state}.

Applying the algorithm with this canonical cover yields the decomposition: {{cust-num, name, house-num, street, city, zip}, {house-num, street, city, state, zip}}

Hence, decomposition into 3NF is not unique.

Summary: the preferred database design is all of:

If this can not be achieved (because BCNF decompositions do not always preserve dependencies), the usual choice is:

which can always be achieved.

Return to Table of Contents