How to Deal with SQL Modification Anomalies and Normal Forms
A host of problems — called modification anomalies — can plague a database if you don’t structure the SQL database correctly. To prevent these problems, you can normalize the database structure. Normalization generally entails splitting one database table into two simpler tables.
Modification anomalies are so named because they are generated by the addition of, change to, or deletion of data from a database table.
Suppose, for example, that your company sells household cleaning products, and you charge all customers the same price for each product. The SALES table keeps track of everything for you. Assume that customer 1001 moves away and no longer is a customer. You don’t care what he’s bought in the past, because he’s not going to buy anything from you again. You want to delete his row from the table.
If you do so, however, you don’t just lose the fact that customer 1001 has bought laundry detergent; you also lose the fact that laundry detergent costs $12. This situation is called a deletion anomaly. In deleting one fact (that customer 1001 bought laundry detergent), you inadvertently delete another fact (that laundry detergent costs $12).
You can use the same table to illustrate an insertion anomaly. For example, suppose you want to add stick deodorant to your product line at a price of $2. You can’t add this data to the SALES table until a customer buys stick deodorant.
The problem with the SALES table is that this table deals with more than one thing: It covers not just which products customers buy, but also what the products cost. To eliminate the anomalies, you have to split the SALES table into two tables, each dealing with only one theme or idea.
CUST_PURCH, which deals with the single idea of customer purchases.
PROD_PRICE, which deals with the single idea of product pricing.
You can now delete the row for customer 1001 from CUST_PURCH without losing the fact that laundry detergent costs $12. (The cost of laundry detergent is now stored in PROD_PRICE.) You can also add stick deodorant to PROD_PRICE whether or not anyone has bought the product. Purchase information is stored elsewhere, in the CUST_PURCH table.
The process of breaking up a table into multiple tables, each of which has a single theme, is called normalization. A normalization operation that solves one problem may not affect other problems. You may have to perform several successive normalization operations to reduce each resulting table to a single theme.
Each database table should deal with one — and only one — main theme. Sometimes (as you probably guessed) determining that a table really deals with two or more themes can be difficult.
You can classify tables according to the types of modification anomalies to which they’re subject. In a 1970 paper, E. F. Codd identified three sources of modification anomalies and defined first, second, and third normal forms (1NF, 2NF, 3NF) as remedies to those types of anomalies. In the ensuing years, Codd and others discovered additional types of anomalies and specified new normal forms to deal with them.
The Boyce-Codd normal form (BCNF), the fourth normal form (4NF), and the fifth normal form (5NF) each afforded a higher degree of protection against modification anomalies. Not until 1981, however, did a paper, written by Ronald Fagin, describe domain-key normal form or DK/NF. Using this last normal form enables you to guarantee that a table is free of modification anomalies.
The normal forms are nested in the sense that a table that’s in 2NF is automatically also in 1NF. Similarly, a table in 3NF is automatically in 2NF, and so on. For most practical applications, putting a database in 3NF is sufficient to ensure a high degree of integrity. To be absolutely sure of its integrity, you must put the database into DK/NF.
After you normalize a database as much as possible, you may want to make selected denormalizations to improve performance. If you do, be aware of the types of anomalies that may now become possible.