Normalization Using Multivalued Dependencies

In some cases, a relation scheme in BCNF still seems to have unnecessary repetition. For example, consider a scheme:

Purchase2-scheme = (cust-num, address, flyrod-stock-num, date)

and suppose that the flyshop keeps both the home and work address of each customer. That is, the functional dependency cust-num tex2html_wrap_inline36 address no longer holds, so our new set of functional dependencies F = {cust-num tex2html_wrap_inline36 name, flyrod-stock-num tex2html_wrap_inline36 {manufacturer, length, line-weight}}

Purchase2-scheme is in BCNF with respect to this F, because there are no nontrivial functional dependencies in the restriction of tex2html_wrap_inline46 to Purchase2-scheme. However, there is still redundancy present, because to ensure that any queries involving addresses and flyrod-stock-nums of flyrods purchased returns the correct result, we need to store two tuples for each flyrod purchased (one with each address). Note that this relation stores unrelated (or independent) information together - there is no relationship between address and flyrod-stock-num, except through cust-num.

Multivalued Dependencies

Like a functional dependency, a multivalued dependency is a constraint on relations. However, a functional dependency specifies that certain tuples can NOT exist - if tex2html_wrap_inline48 tex2html_wrap_inline36 tex2html_wrap_inline52 , then there can not be two tuples that agree on tex2html_wrap_inline48 but disagree on tex2html_wrap_inline52 . A multivalued dependency specifies that certain tuples must exist. In the above example, if we have a tuple (2, ``Graceland'', 2, 8/8/1992) in purchase2, i.e. with Elvis's home address, then we also expect to see a tuple (2, ``Sun Records'', 2, 8/8/1992) with Elvis's work address. Hence, for Purchase2-scheme, we have a multivalued dependency cust-num tex2html_wrap_inline36tex2html_wrap_inline36 address.

More formally, let R be a relation scheme with tex2html_wrap_inline48 tex2html_wrap_inline66 R and tex2html_wrap_inline52 tex2html_wrap_inline66 R. The multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 holds on R if for all legal relations r(R) if for any two tuples tex2html_wrap_inline88 and tex2html_wrap_inline90 tex2html_wrap_inline92 r where tex2html_wrap_inline96 = tex2html_wrap_inline98 , then there exist tuples tex2html_wrap_inline100 and tex2html_wrap_inline102 tex2html_wrap_inline92 r such that all of the following hold:

Example: Consider the multivalued dependency cust-num tex2html_wrap_inline36tex2html_wrap_inline36 address, and let tex2html_wrap_inline88 = (2, ``Graceland'', 2, 8/8/1992) and tex2html_wrap_inline90 = (2, ``Sun Records'', 2, 8/8/1992). Clearly, tex2html_wrap_inline140 = tex2html_wrap_inline142 . Let tex2html_wrap_inline100 = tex2html_wrap_inline88 and tex2html_wrap_inline102 = tex2html_wrap_inline90 . We need to check:

which are all satisfied. To check that tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 in general, one must find such tuples tex2html_wrap_inline100 and tex2html_wrap_inline102 for every pair of tuples tex2html_wrap_inline88 and tex2html_wrap_inline90 that agree on tex2html_wrap_inline48 .

What a multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 really says is that attributes in tex2html_wrap_inline52 and R - tex2html_wrap_inline52 - tex2html_wrap_inline48 are independent, because all values of those attributes that occur with tex2html_wrap_inline48 must occur in every possible combination with tex2html_wrap_inline48.

As with functional dependencies, we will use multivalued dependencies as:

As with functional dependencies, a multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 is trivial if tex2html_wrap_inline52 tex2html_wrap_inline66 tex2html_wrap_inline48 Additionally, such a multivalued dependency is trivial if tex2html_wrap_inline48 tex2html_wrap_inline220 tex2html_wrap_inline52 = R. This second kind of trivial dependency is always true because R - tex2html_wrap_inline52 tex2html_wrap_inline66 tex2html_wrap_inline48 in this case. To verify this, choose any sets of attributes tex2html_wrap_inline48 and tex2html_wrap_inline52 such that tex2html_wrap_inline48 tex2html_wrap_inline220 tex2html_wrap_inline52 = R, and any two tuples tex2html_wrap_inline88 and tex2html_wrap_inline90 that agree on tex2html_wrap_inline48 . Let tex2html_wrap_inline100 = tex2html_wrap_inline88 and tex2html_wrap_inline102 = tex2html_wrap_inline90 and verify the conditions for a multivalued dependency to hold.

To see a case where the multivalued dependency cust-num tex2html_wrap_inline36tex2html_wrap_inline36 address does not hold, consider:

purchase2
cust-num address flyrod-stock-num date
1 E258 3 9/4/1994
3 E258 3 9/3/1994
2 Graceland 2 8/8/1992
2 Sun Records 2 8/8/1992
3 E258 2 9/2/1994
3 E258 1 9/1/1994
2 Graceland 3 4/12/1997

Note that the multivalued dependency would hold if this relation contained the tuple: (2, ``Sun Records'', 3, 4/12/1997). A relation can always be made to satisfy a multivalued dependency by adding tuples.

If we have a functional dependency tex2html_wrap_inline48 tex2html_wrap_inline36 tex2html_wrap_inline52 , then the multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 also holds. To verify this, consider any two tuples tex2html_wrap_inline88 and tex2html_wrap_inline90 such that tex2html_wrap_inline96 = tex2html_wrap_inline98 . By definition of tex2html_wrap_inline48 tex2html_wrap_inline36 tex2html_wrap_inline52 , tex2html_wrap_inline6 = tex2html_wrap_inline8 Let tex2html_wrap_inline10 = tex2html_wrap_inline12 and tex2html_wrap_inline14 = tex2html_wrap_inline16 and verify the conditions for tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 .

As with functional dependencies, we will want to consider the closure of sets of multivalued dependencies (and also sets consisting of both types of dependencies). Unfortunately, we do not have a simple algorithm for computing such a closure, or even for computing the closure of a set of attributes when multivalued dependencies are involved. Even the axioms and rules for multivalued dependencies are more complicated for multivalued dependencies.

Some of the more important rules for reasoning about multivalued dependencies dependencies are:

While these rules are usually sufficient for reasoning about multivalued dependencies and they are sound (they do not allow any invalid multivalued dependencies to be inferred), they are NOT complete (not every valid multivalued dependency can be inferred using only the rules listed). See the text for a sound and complete set of rules. Note that these rules are different from the rules for functional dependencies. In particular, there is no decomposition rule for multivalued dependencies.

Examples: Consider R = {A, B, C, E, G}, and consider a set of dependencies D = {A tex2html_wrap_inline36 G, A tex2html_wrap_inline36tex2html_wrap_inline36 BC, A tex2html_wrap_inline36tex2html_wrap_inline36 CE, C tex2html_wrap_inline36tex2html_wrap_inline36 EG}

Some of the elements of tex2html_wrap_inline442 are:

Return to Table of Contents

Fourth Normal Form

Fourth normal form (4NF) is very similar to BCNF, except that it uses multivalued dependencies. 4NF is useful for removing the kinds of redundancy we saw earlier in Purchase2-scheme.

A relation scheme R is in 4NF with respect to a set of functional and multivalued dependencies D if for all multivalued dependencies tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 in tex2html_wrap_inline442 where tex2html_wrap_inline48 tex2html_wrap_inline66 R and tex2html_wrap_inline52 tex2html_wrap_inline66 R, at least one of the following holds:

A database design is in 4NF if each relation scheme of the design is in 4NF.

Note that the definition of 4NF is identical to the definition of BCNF, except that it involves multivalued dependencies instead of functional dependencies. Additionally, every scheme that is in 4NF is also in BCNF, because every functional dependency tex2html_wrap_inline48 tex2html_wrap_inline36 tex2html_wrap_inline52 implies the multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 by the replication rule.

Examples:

Consider Purchase2-scheme from our running example, with D = {cust-num tex2html_wrap_inline36tex2html_wrap_inline36 address}. To check that this scheme is in 4NF, we consider this multivalued dependency and ask two questions:

Since the answer to both questions is no, this scheme is not in 4NF. Recall that we showed earlier that this scheme is in BCNF.

As for BCNF, we have an algorithm for finding a decomposition into 4NF. This algorithm is very similar to the BCNF decomposition algorithm.

The 4NF Decomposition Algorithm:

inputs: a relation scheme R and a set 
        of functional and multivalued 
        dependencies D
output: a lossless-join decomposition of R that is in 4NF
result := {R} compute tex2html_wrap_inline442 while there is a relation scheme tex2html_wrap_inline592 in result that is not in 4NF do
choose tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 tex2html_wrap_inline92 tex2html_wrap_inline442 such that tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 is on tex2html_wrap_inline592 and tex2html_wrap_inline48 tex2html_wrap_inline36 tex2html_wrap_inline592 is not in tex2html_wrap_inline442 and tex2html_wrap_inline48 tex2html_wrap_inline420 tex2html_wrap_inline52 = {}
result := (result - tex2html_wrap_inline592) tex2html_wrap_inline220 (tex2html_wrap_inline634) tex2html_wrap_inline220 (tex2html_wrap_inline638)
end

The result is a lossless-join decomposition because we can define lossless-join for multivalued dependencies in exactly the same way that we did for functional dependencies. That is, if R is a relation scheme and D is a set of functional and multivalued dependencies on R, then a decomposition tex2html_wrap_inline646 of R is a lossless-join decomposition if at least one of the following is in tex2html_wrap_inline442:

Note that this is a weaker condition for lossless-join than the condition in terms of functional dependencies, because tex2html_wrap_inline652 tex2html_wrap_inline420 tex2html_wrap_inline656 tex2html_wrap_inline36 tex2html_wrap_inline652 (for example) implies that tex2html_wrap_inline652 tex2html_wrap_inline420 tex2html_wrap_inline656 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline652 , but the reverse is not true in general.

From the algorithm, we have that tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 and (tex2html_wrap_inline634) tex2html_wrap_inline420 (tex2html_wrap_inline638) is tex2html_wrap_inline48 . Clearly, tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 (tex2html_wrap_inline638), by tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline48 and the multivalued union rule.

To use this algorithm to decompose Purchase2-scheme:

initially, result := {{cust-num, address, flyrod-stock-num, date}}
choose cust-num tex2html_wrap_inline36tex2html_wrap_inline36 address
The resulting decomposition is R = tex2html_wrap_inline646 , where tex2html_wrap_inline652 = {cust-num, flyrod-stock-num, date} and tex2html_wrap_inline656 = {cust-num, address}.

There are no nontrivial multivalued dependencies in the restriction of tex2html_wrap_inline442 to tex2html_wrap_inline652 , so this scheme is in 4NF.

For tex2html_wrap_inline656 , note that cust-num is NOT a superkey, because the functional dependency cust-num tex2html_wrap_inline36 address is not in tex2html_wrap_inline442 . However, the only multivalued dependencies in tex2html_wrap_inline442 that are on this scheme are:

Each of these multivalued dependencies is trivial. (Recall that a multivalued dependency tex2html_wrap_inline48 tex2html_wrap_inline36tex2html_wrap_inline36 tex2html_wrap_inline52 is trivial if tex2html_wrap_inline48 tex2html_wrap_inline66 R and tex2html_wrap_inline52 tex2html_wrap_inline66 R, OR if tex2html_wrap_inline48 tex2html_wrap_inline220 tex2html_wrap_inline52 = R. Hence, this scheme is in 4NF and so the entire decomposition is in 4NF.

Note that this decomposition is also in BCNF (because we have no nontrivial functional dependencies).

Does this algorithm produce dependency preserving decompositions? To answer this question, we need to define dependency preservation in the presence of multivalued dependencies.

Let D be a set of functional and multivalued dependencies. The restriction of D to a relation scheme R is the union of the following two sets of dependencies:

A decomposition of scheme R into schemes { tex2html_wrap_inline652 , tex2html_wrap_inline656 , ..., tex2html_wrap_inline856 } is dependency preserving with respect to a set of dependencies D if, for every set of relations tex2html_wrap_inline860 , tex2html_wrap_inline862 , ..., tex2html_wrap_inline864 such that each tex2html_wrap_inline866 satisfies the restriction of D to tex2html_wrap_inline592 , then there exists a relation r(R) that satisfies D such that tex2html_wrap_inline866 = tex2html_wrap_inline878 for each tex2html_wrap_inline866 .

Clearly, testing that a decomposition is dependency preserving is difficult when multivalued dependencies are involved. Suffice it to say that the 4NF decomposition algorithm is not dependency preserving, and that some relation schemes and sets of dependencies do not have a dependency preserving decomposition into 4NF. Given the close relationship between 4NF and BCNF, this is not surprising.

Return to Table of Contents

Summary of Normalization

Lossless-join decomposition is vital. We do not even consider lossy-join decompositions.

Dependency preservation is useful, but not essential. We may accept a weaker normal form to keep dependency preservation, or we may sacrifice dependency preservation, depending on the situation.

BCNF, 3NF and 4NF are related as follows: any relation scheme that is in 4NF is in BCNF, and any relation that is in BCNF is in 3NF. We have shown separation by exhibiting schemes that are in 3NF but not BCNF, and schemes that are in BCNF but not 4NF. Hence, no level of normalization is equivalent to another.

We can always achieve a lossless join, dependency preserving decomposition into 3NF, but sometimes there is no dependency preserving decomposition into BCNF or 4NF.

The higher the level of normalization (4NF being highest), the less redundancy present in the relation scheme.

First and second normal forms have been defined. Second normal form (2NF) is of historical interest only, because schemes in 2NF are less normalized than those in 3NF, and we can always achieve a dependency preserving decomposition into 3NF.

First normal form states that each attribute of a tuple is atomic - that is, that the value of each attribute for each tuple is a single value, and not a set of values. Some commercial relational database systems (notably Pick and Picklike systems) do not require relations to be in first normal form. This has a major affect on querying, data definition and normalization.

Return to Table of Contents