Potential Problem Areas in Multitable SQL Databases
Data integrity is subject to assault from a variety of quarters. Some of these problems arise only in multitable SQL databases; others can happen even in databases that contain only a single table. You want to recognize and minimize all these potential threats.
Bad input data
The source documents or data files that you use to populate your database may contain bad data. This data may be a corrupted version of the correct data, or it may not be the data you want. A range check tells you whether the data has domain integrity.
This type of check catches some — but not all — problems. For example, incorrect field values that are within the acceptable range — but still incorrect — aren’t identified as problems.
Your source data may be correct, but the data entry operator may incorrectly transcribe the data. This type of error can lead to the same kinds of problems as bad input data. Some of the solutions are the same, too. Range checks help, but they’re not foolproof. Another solution is to have a second operator independently validate all the data.
This approach is costly because independent validation takes twice the number of people and twice the time. But in some cases where data integrity is critical, the extra effort and expense may prove worthwhile.
If you experience a mechanical failure, such as a disk crash, the data in the table may be destroyed. Good backups are your main defense against this problem.
Consider the possibility that someone may want to corrupt your data. Your first line of defense is to deny database access to anyone who may have a malicious intent and to restrict authorized users so they can access only the data they need. Your second defense is to maintain data backups in a safe place. Periodically re-evaluate the security features of your installation. Being a little paranoid doesn’t hurt.
Data redundancy — the same data items cropping up in multiple places — is a big problem with the hierarchical database model, but the problem can plague relational databases, too. Not only does such redundancy waste storage space and slow down processing, but it can also lead to serious data corruption.
If you store the same data item in two different tables in a database, the item in one of those tables may change while the corresponding item in the other table remains the same. This situation generates a discrepancy, and you may have no way of determining which version is correct. That’s a good reason to keep data redundancy to a minimum.
Although a certain amount of redundancy is necessary for the primary key of one table to serve as a foreign key in another, you should try to avoid the repetition of any data items beyond that.
After you eliminate most redundancy from a database design, you may find that performance is now unacceptable. Operators often purposefully use a little redundancy to speed up processing.
A common practice is to initially design a database with little redundancy and with high degrees of normalization, and then, after finding that important applications run slowly, to selectively add redundancy and denormalize. The key word here is selectively.
The redundancy that you add back in must have a specific purpose, and because you’re acutely aware of both the redundancy and the hazard it represents, you take appropriate measures to ensure that the redundancy doesn’t cause more problems than it solves.
When you exceed the capacity of your DBMS
A database system might work properly for years and then start experiencing intermittent errors that become progressively more serious. This may be a sign that you’re approaching one of the system’s capacity limits. There are, after all, limits to the number of rows that a table may have. There are also limits on columns, constraints, and various other database features.
Check the current size and content of your database against the specifications listed in the documentation of your DBMS. If you’re near the limit in any area, consider upgrading to a system with a higher capacity. Or you may want to archive older data that is no longer active and then delete it from your database.